Posted on May 22, 2020
I've recently been working on an Apache Kafka/Confluent data pipeline to analyse event streams. I decided to use Google Cloud BigQuery for the data analysis as it seemed to be easy to get set up with and extremely powerful. But to get up and running I'd need to backfill all my existing data. I also decided to add it to a time-partitioned table to increase performance and reduce costs.