Skip to main content
Partitioning your BigQuery tables reduces the amount of data scanned during Artie’s merge operations, which lowers merge latency and can significantly reduce your BigQuery bill. This guide walks through converting an existing unpartitioned table to a partitioned one while keeping your Artie pipeline in sync.

Prerequisites

  • An existing Artie pipeline writing to a BigQuery destination
  • Permissions to run DDL statements (CREATE TABLE, DROP TABLE, ALTER TABLE) in your BigQuery dataset

Enable partitioning

For this example, consider a source table in Postgres:
CREATE TABLE events (
    event_id SERIAL PRIMARY KEY,
    event_name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
1

Pause your Artie pipeline

In the Artie dashboard, navigate to your pipeline and click Pause. This prevents new data from being merged while you restructure the table.
2

Choose a partition column

Pick a TIMESTAMP or DATE column that your queries frequently filter on. Good candidates are columns like created_at, updated_at, or event_date.For most CDC workloads, daily time partitioning on a created_at-style column provides the best balance between partition pruning and manageability.
3

Recreate the table with partitioning in BigQuery

Run the following SQL in the BigQuery console to create a partitioned copy, drop the original, and rename the copy:
This operation drops the original table. Verify that the partitioned copy contains all expected data before running the DROP TABLE statement.
CREATE TABLE artie.events_copy
  PARTITION BY DATE(created_at)
  AS SELECT * FROM artie.events;
After verifying the data in events_copy looks correct:
DROP TABLE artie.events;
ALTER TABLE artie.events_copy RENAME TO events;
4

Update table settings in Artie

In the Artie dashboard, go to your pipeline > Edit > Tables > find the events table > Table settings.Under the BigQuery partitioning setting, specify the partition column (e.g. created_at). This tells Artie to include the partition column in merge predicates, which enables BigQuery to prune partitions during merges.See Advanced settings for more details on table-level configuration.
5

Deploy and verify

Click Save and then Deploy to resume your pipeline.To confirm partitioning is active, run the following query in BigQuery:
SELECT table_name, partition_id, total_rows
FROM `artie.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'events'
ORDER BY partition_id;
You should see multiple rows, one per partition, instead of a single NULL partition.

Background

A partitioned table is divided into segments called partitions, making it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query. You partition tables by specifying a partition column which is used to segment the table.
  • Faster merges — Artie’s merge operations scan only the relevant partitions instead of the entire table.
  • Lower costs — Fewer bytes scanned means lower BigQuery compute charges, especially under on-demand pricing.
  • Higher quotas — Partitioned tables have higher DML quotas than unpartitioned tables.
  • Partition-level management — You can write to, delete, or expire individual partitions without affecting the rest of the table.
Partitioning typeDescriptionArtie recommendation
Time partitioningPartitions based on a TIMESTAMP, DATE, or DATETIME column. Supports hourly, daily, monthly, and yearly granularity.Recommended. Works well with CDC workloads where data naturally has a time dimension.
Integer rangePartitions based on value ranges of an integer column (e.g. customer_id 0–9 in one partition, 10–19 in the next).Useful when your data has no natural time column but has a well-distributed integer key.
Ingestion timePartitions based on when rows were inserted into BigQuery.Not recommended. Artie’s merge operations update existing rows, which can create duplicates across partitions since the ingestion timestamp changes on each merge.