Prerequisites
- An existing Artie pipeline writing to a BigQuery destination
- Permissions to run DDL statements (
CREATE TABLE,DROP TABLE,ALTER TABLE) in your BigQuery dataset
Enable partitioning
For this example, consider a source table in Postgres:Pause your Artie pipeline
In the Artie dashboard, navigate to your pipeline and click Pause. This prevents new data from being merged while you restructure the table.
Choose a partition column
Pick a
TIMESTAMP or DATE column that your queries frequently filter on. Good candidates are columns like created_at, updated_at, or event_date.For most CDC workloads, daily time partitioning on a created_at-style column provides the best balance between partition pruning and manageability.Recreate the table with partitioning in BigQuery
Run the following SQL in the BigQuery console to create a partitioned copy, drop the original, and rename the copy:After verifying the data in
events_copy looks correct:Update table settings in Artie
In the Artie dashboard, go to your pipeline > Edit > Tables > find the
events table > Table settings.Under the BigQuery partitioning setting, specify the partition column (e.g. created_at). This tells Artie to include the partition column in merge predicates, which enables BigQuery to prune partitions during merges.See Advanced settings for more details on table-level configuration.Background
What is a partitioned table?
What is a partitioned table?
A partitioned table is divided into segments called partitions, making it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query. You partition tables by specifying a partition column which is used to segment the table.
Why use table partitions?
Why use table partitions?
- Faster merges — Artie’s merge operations scan only the relevant partitions instead of the entire table.
- Lower costs — Fewer bytes scanned means lower BigQuery compute charges, especially under on-demand pricing.
- Higher quotas — Partitioned tables have higher DML quotas than unpartitioned tables.
- Partition-level management — You can write to, delete, or expire individual partitions without affecting the rest of the table.
What are the different partitioning strategies?
What are the different partitioning strategies?
| Partitioning type | Description | Artie recommendation |
|---|---|---|
| Time partitioning | Partitions based on a TIMESTAMP, DATE, or DATETIME column. Supports hourly, daily, monthly, and yearly granularity. | Recommended. Works well with CDC workloads where data naturally has a time dimension. |
| Integer range | Partitions based on value ranges of an integer column (e.g. customer_id 0–9 in one partition, 10–19 in the next). | Useful when your data has no natural time column but has a well-distributed integer key. |
| Ingestion time | Partitions based on when rows were inserted into BigQuery. | Not recommended. Artie’s merge operations update existing rows, which can create duplicates across partitions since the ingestion timestamp changes on each merge. |