Flush Rules

How it works

is a critical part of Artie’s data pipeline that determines when and how data gets written to your destination.

Data buffering

Artie’s reading process will read changes from your source database and publish them to Kafka
Artie’s writing process will read messages from Kafka and write them to your destination
Messages are temporarily stored in memory and deduplicated based on primary key(s) or unique index
Multiple changes to the same record are merged to reduce write volume

Flush trigger evaluation

Artie continuously monitors three flush conditions
When any condition is met, a flush is triggered
Reading from Kafka pauses during the flush operation

Data loading

Buffered data is written to your destination in an optimized batch
After completion, Artie will commit the offset and resume reading from Kafka
The cycle repeats for continuous data flow

Conditions

Artie evaluates three conditions to determine when to flush data. Any one of these conditions will trigger a flush:

Time elapsed

Maximum time in seconds — Ensures data freshness even during low-volume periods

Message count

Number of deduplicated messages — Based on unique primary keys or unique index.

Byte size

Total bytes of deduplicated data — Actual payload size after deduplication

Setting optimal rules

The right flush configuration depends on your destination type, data volume, and latency requirements.

OLTP destinations

For transactional databases like PostgreSQL, MySQL, or SQL Server:

Recommended approach

Smaller, frequent flushes work well because:

Row-based storage handles individual record operations efficiently
Native UPSERT/MERGE operations minimize overhead

Example configuration:

Messages: 1,000-5,000 records
Bytes: 10-50 MB
Time: 30-60 seconds

OLAP destinations

For analytical databases like Snowflake, Databricks, BigQuery, or Redshift:

Setting the flush rules too low can hinder throughput and cause latency spikes:

Fixed overhead costs: Each flush has connection/metadata overhead that dominates processing time with small batches
Inefficient resource usage: OLAP systems are designed for large parallel operations, not frequent micro-operations
Storage and query degradation: Many small files hurt compression, increase metadata lookups, and trigger excessive compaction
Recommendation: For OLAP destinations, set higher row/byte limits and rely on time-based triggers

Recommended approach

Larger, less frequent flushes are optimal because:

Columnar storage benefits from batch processing
Reduced metadata overhead and better compression
More efficient query performance with fewer small files

Example configuration:

Messages: 25,000-500,000 records
Bytes: 50-500 MB
Time: 3-15 minutes

Note: We also have multi-step merge that can be enabled for tables that have a lot of write throughput and would like to have extremely large flush batches (1GB+).

Best practices

Start conservative

Begin with smaller flush values and increase based on observed performance and destination capabilities.

Validate through flush metrics

As you experiment and fine-tune the flush rules, you can see which rule triggered the flush as the reason in “Flush Count” graph from the analytics portal.

Monitor and adjust

Track flush frequency, batch sizes, and end-to-end latency to optimize over time.

Consider your SLA

Time threshold should align with your data freshness requirements and business SLAs.

Advanced

See flush reason in the analytics portal

Getting Started

Concepts

Connectors

Tables

Monitoring

Artie Dashboard

How it works

Conditions

Time elapsed

Message count

Byte size

Setting optimal rules

Recommended approach

Recommended approach

Best practices

Start conservative

Validate through flush metrics

Monitor and adjust

Consider your SLA

Advanced

Getting Started

Concepts

Connectors

Tables

Monitoring

Artie Dashboard

​How it works

​Conditions

Time elapsed

Message count

Byte size

​Setting optimal rules

Recommended approach

Recommended approach

​Best practices

Start conservative

Validate through flush metrics

Monitor and adjust

Consider your SLA

​Advanced

How it works

Conditions

Setting optimal rules

Best practices

Advanced