Enable BigQuery Partitioning
Enable partitioning to lower your merge latency and reduce the amount of bytes scanned.
Steps to turn on partitioning
For this example, consider this table in Postgres.
-
First pause your Artie pipeline
-
Recreate the table in BigQuery and make sure to use the right partitioning strategy.
- Edit your pipeline and update the table settings for
events
- Click
Save
andDeploy
What is a partitioned table?
What is a partitioned table?
A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query. You partition tables by specifying a partition column which is used to segment the table.
Why should we use table partitions?
Why should we use table partitions?
- Improve query performance by scanning a partition.
- When you exceed the standard table quota.
- Gain partition-level management features such as writing to or deleting partition(s) within a table.
- Reduce the number of bytes processed + reduce your BigQuery bill
What are the different kinds of partitioning strategies?
What are the different kinds of partitioning strategies?
Partitioning type | Description | Example |
---|---|---|
Time partitioning | Partitioning a particular column that is a TIMESTAMP. | Column: timestamp |
Integer range or interval based partitions | Partitioning off of a range of values for a given column. | Say you have a column called customer_id and there are 100 values. |
Ingestion-based | This is when the row was inserted. | NA |