Partitioning your BigQuery tables reduces the amount of data scanned during Artie’s merge operations, which lowers merge latency and can significantly reduce your BigQuery bill. This guide walks through converting an existing unpartitioned table to a partitioned one while keeping your Artie pipeline in sync.Documentation Index
Fetch the complete documentation index at: https://artie.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- An existing Artie pipeline writing to a BigQuery destination
- Permissions to run DDL statements (
CREATE TABLE,DROP TABLE,ALTER TABLE) in your BigQuery dataset
Enable partitioning
For this example, consider a source table in Postgres:Pause your Artie pipeline
In the Artie dashboard, navigate to your pipeline and click Pause. This prevents new data from being merged while you restructure the table.
Choose a partition column
Pick a
TIMESTAMP or DATE column that your queries frequently filter on. Good candidates are columns like created_at, updated_at, or event_date.For most CDC workloads, daily time partitioning on a created_at-style column provides the best balance between partition pruning and manageability.Recreate the table with partitioning in BigQuery
Run the following SQL in the BigQuery console to create a partitioned copy, drop the original, and rename the copy:After verifying the data in
events_copy looks correct:Update table settings in Artie
In the Artie dashboard, go to your pipeline > Edit > Tables > find the
events table > Table settings.Under the BigQuery partitioning setting, specify the partition column (e.g. created_at). This tells Artie to include the partition column in merge predicates, which enables BigQuery to prune partitions during merges.See Advanced settings for more details on table-level configuration.Background
What is a partitioned table?
What is a partitioned table?
A partitioned table is divided into segments called partitions, making it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query. You partition tables by specifying a partition column which is used to segment the table.
Why use table partitions?
Why use table partitions?
- Faster merges — Artie’s merge operations scan only the relevant partitions instead of the entire table.
- Lower costs — Fewer bytes scanned means lower BigQuery compute charges, especially under on-demand pricing.
- Higher quotas — Partitioned tables have higher DML quotas than unpartitioned tables.
- Partition-level management — You can write to, delete, or expire individual partitions without affecting the rest of the table.
What are the different partitioning strategies?
What are the different partitioning strategies?
| Partitioning type | Description | Artie recommendation |
|---|---|---|
| Time partitioning | Partitions based on a TIMESTAMP, DATE, or DATETIME column. Supports hourly, daily, monthly, and yearly granularity. | Recommended. Works well with CDC workloads where data naturally has a time dimension. |
| Integer range | Partitions based on value ranges of an integer column (e.g. customer_id 0–9 in one partition, 10–19 in the next). | Useful when your data has no natural time column but has a well-distributed integer key. |
| Ingestion time | Partitions based on when rows were inserted into BigQuery. | Not recommended. Artie’s merge operations update existing rows, which can create duplicates across partitions since the ingestion timestamp changes on each merge. |
Related
- BigQuery destination setup — Configure BigQuery as a destination in Artie, including queue priority settings.
- Capacity pricing — Understand BigQuery’s pricing models and avoid unexpected costs.