How it works

is a critical part of Artie’s data pipeline that determines when and how data gets written to your destination.
1

Data buffering

  • Artie’s reading process will read changes from your source database and publish them to Kafka
  • Artie’s writing process will read messages from Kafka and write them to your destination
  • Messages are temporarily stored in memory and deduplicated based on primary key(s) or unique index
  • Multiple changes to the same record are merged to reduce write volume
2

Flush trigger evaluation

  • Artie continuously monitors three flush conditions
  • When any condition is met, a flush is triggered
  • Reading from Kafka pauses during the flush operation
3

Data loading

  • Buffered data is written to your destination in an optimized batch
  • After completion, Artie will commit the offset and resume reading from Kafka
  • The cycle repeats for continuous data flow

Conditions

Artie evaluates three conditions to determine when to flush data. Any one of these conditions will trigger a flush:

Time elapsed

Maximum time in seconds — Ensures data freshness even during low-volume periods

Message count

Number of deduplicated messages — Based on unique primary keys or unique index.

Byte size

Total bytes of deduplicated data — Actual payload size after deduplication

Setting optimal rules

The right flush configuration depends on your destination type, data volume, and latency requirements.

Best practices

Start conservative

Begin with smaller flush values and increase based on observed performance and destination capabilities.

Validate through flush metrics

As you experiment and fine-tune the flush rules, you can see which rule triggered the flush as the reason in “Flush Count” graph from the analytics portal.

Monitor and adjust

Track flush frequency, batch sizes, and end-to-end latency to optimize over time.

Consider your SLA

Time threshold should align with your data freshness requirements and business SLAs.

Advanced