Changelog
Moving high-volume data is hard. Artie replicates operational data into warehouses and lakes — reliably, without the heavy engineering most pipelines require. On this page, we break down the latest features, what they solve, and why they matter.
June 2, 2025
Iceberg Support Using S3 Tables
This launch adds something big: support for Apache Iceberg using S3 Tables.
Artie customers can now:
- Stream high-volume datasets into Iceberg-backed tables stored on S3
- Use S3 Tables’ fully managed catalog, compaction, and snapshot management
- Query efficiently with Spark SQL (via EMR + Apache Livy) without wrestling with cluster glue
- Get up to 3x faster query performance thanks to automatic background compaction
Why is Iceberg a big deal? Because it solves what’s frustrating and limiting about traditional S3-based data lakes. Hive tables are rigid and brittle, with no snapshotting or time travel. Delta Lake is powerful but tied to the Databricks ecosystem. Plain S3 file storage? No metadata layer, no transactions, no query optimizations.
Instead, Iceberg gives you a fully open, cloud-native table format with smooth schema evolution, hidden partitions, snapshot isolation, and time-travel queries – all with broad engine support (Spark, Trino, Flink, Presto, Hive).
We’re excited about this because it means Artie customers can confidently move massive data volumes without needing to hand-build the plumbing – Iceberg and S3 Tables handle schema changes, partitioning, compaction, and snapshot management behind the scenes, so the system scales cleanly without brittle, custom workflows.
📚 Want to set up Iceberg-backed pipelines? Docs to get started: https://artie.com/docs/destinations/iceberg/s3tables
May 14, 2025
S3 Iceberg destination (Beta)
S3 Iceberg is now available in beta! This new destination uses AWS’s recently released S3 Tables support, allowing you to replicate directly into Apache Iceberg tables backed by S3. It’s a big unlock for teams building modern lakehouse architectures on open standards.
Column Inclusion Rules
You can now define an explicit allowlist of columns to replicate - ideal for PII or other sensitive data. This expands our column-level controls alongside column exclusion and hashing. Only the fields you specify get replicated. Everything else stays out.
Autopilot for New Tables
Stop manually hunting for new tables in your source DB. Autopilot finds and syncs them for you - zero config required. Turn it on via:
Deployment → Destination Settings → Advanced Settings → “Auto-replicate new tables”
Data Quality: Rows Affected Checks
To further enhance the data integrity built into our pipeline, we’ve added another guardrail: verifying the number of rows affected during each database operation.
For example, during merge steps (such as in Snowflake), we confirm ROWS_LOADED from copy commands and validate the totals for inserted, updated, or deleted rows. This approach reinforces the robustness of our data replication process and it’s another way we catch issues early and ensure replication integrity.
Read Once, Write Many
We recently launched the ability to read-once and write to multiple destinations. This means you no longer need multiple replication slots on your source database.
For example, by reading data just once from your Postgres instance and simultaneously replicating it to Snowflake and Redshift, you reduce database overhead and simplify replication architecture.
Multi-Data Plane Support
Artie now supports hosting pipelines across multiple data planes, whether you’re on our cloud or using your own (BYOC) infrastructure.
For example, run one pipeline from Postgres to Snowflake in AWS US-East-1 and another from MySQL to Snowflake in AWS US-West-2.
Oracle Fan-in
With our Oracle Fan-in feature, you can now easily replicate data from thousands of Oracle sources - without painful manual setups or infrastructure overload. Fan-in reduces your Kafka topic sprawl, lowers infrastructure costs, and simplifies real-world, complex data replication.