September 9, 2025
New Data Type Support for Postgres
Postgres is powerful because of its flexibility, but until now, many advanced data types weren’t supported in replication tools. That meant teams using features like multiranges or custom composite types had to rework schemas or maintain brittle workarounds — slowing them down. With this update, Artie removes that limitation. For teams building production-critical systems on Postgres, you can now replicate complex data types with the same reliability as standard ones. Artie now supports: New Postgres Data Types We’ve added support for:- TSTZMULTIRANGE (multirange): Added in Postgres 14, this lets you store multiple non-overlapping time intervals in one column — perfect for tracking availability windows without conflicts.
- Custom enums: Define your own set of valid string values, like a controlled list of statuses or product sizes, and replicate them reliably.
- Custom composite (tuple) types: Create structured types that combine multiple fields into one, such as storing an address (city, state, street) in a single column.
- Supports advanced Postgres use cases without schema redesign
- Eliminates brittle workarounds when using complex data types
- Keeps replication consistent across real-world customer schemas
- Opens the door for more application-specific modeling in Postgres
September 4, 2025
Static Columns
Sometimes, teams need more than just the raw data replicated into their warehouse. They also need a way to enrich it with business context — like tags, labels, or metadata that help downstream teams stay organized. Until now, this meant managing that metadata separately, which could lead to extra steps and inconsistencies. Artie now supports: Static Columns You can now add static columns to an existing pipeline. These columns will automatically appear in the destination table alongside your replicated data. For example, a team might use static columns to tag records with region (EU or US), environment (prod or staging), or a unique system identifier. Instead of tracking this separately, the metadata now flows directly with your replicated data — no extra plumbing required. Why this matters:- Keep your downstream data organized with consistent metadata
- Add business context (like region or environment) directly into destination tables
- Reduce manual tagging and cleanup steps after replication
- Enable richer analytics and easier filtering across datasets
September 2, 2025
Source Metadata Columns
When you’re consolidating data from multiple sources, it’s not always enough to just move the rows — you also need to know where they came from. Without that context, compliance checks get harder, debugging slows down, and downstream apps lose critical signals. Artie makes it simple to retain that lineage. Artie now supports: Source Metadata Columns You can now enable source metadata columns in Advanced Settings. When turned on, Artie appends an extra column to replicated tables containing details like transaction ID, log sequence number, schema, table, and database name. For example:- A fintech consolidating dozens of sharded MySQL databases into one Snowflake table can track exactly which shard each row came from.
- A healthcare company can capture source database and table information for HIPAA audit logs.
- A payments team can tie replicated rows back to original transaction IDs for fraud analysis.
- Traceability: See exactly where each row originated, even across shards.
- Auditability: Support compliance and security workflows with source-side context.
- Debugging: Isolate discrepancies quickly by filtering on schema/table metadata.
- Flexibility: Build custom fraud detection, routing, or monitoring logic using metadata.
August 28, 2025
Unifying Tables Across Schemas
For teams managing sharded or micro-sharded databases, downstream complexity multiplies fast. Each shard or schema produces its own copy of every table. That means instead of M tables, you end up with N × M tables in your warehouse (where N = number of shards/schemas, M = number of tables). Analysts are stuck stitching them back together, engineers write endless union queries, and operations teams lose the clean, consolidated view they need. Artie now supports: Unifying Tables across Schemas You can now unify tables across schemas directly in replication. Instead of landing one table per schema, Artie automatically merges them into a single, consolidated destination table. Take an e-commerce platform sharding customers across 50+ schemas: instead of 50 separate users tables, you now get one unified users table downstream. Or a payments company splitting transactions across micro-shards: all those rows flow neatly into one transactions table in Snowflake. With Artie’s fan-in option, the number of downstream tables is simplified back to M, and schema evolution is handled automatically. Why this matters:- Simplified data model: query one table instead of wrangling dozens, with schema evolution managed for you.
- No duplication: eliminate manual unions or stitching scripts in the warehouse.
- Consistent structure: unified naming across shards improves data quality and usability.
- Effortless scaling: add new shards upstream, and they automatically merge downstream.
August 21, 2025
Microsoft Teams Notifications
Artie has long supported notifications via email and Slack. That worked fine, but for teams who live in Microsoft Teams, having alerts show up directly in their workspace is a big quality-of-life improvement. Artie now supports: Microsoft Teams Notifications You can now receive pipeline alerts and notifications directly in Microsoft Teams. Whether it’s replication status, schema changes, or operational alerts, everything flows into the same workspace your team already uses. This builds on our existing notification support (email and Slack), giving you another way to stay connected to your pipelines. Docs 👉 How to enable Teams notificationsAugust 19, 2025
Large JSON Support for Redshift
JSON payloads are everywhere — but when they get large, most tools fall short. Many platforms land JSON into VARCHAR(MAX), which caps out at ~65k characters. For teams working with rich event logs, nested API responses, or anything more complex, that limit means truncated payloads and lost data. Artie now supports: Large JSON support for Redshift When Artie encounters large JSON payloads, we now land them as the SUPER data type in Amazon Redshift. SUPER supports documents up to 16MB in size, preserving the full payload without truncation. That means you can capture, query, and transform large, complex JSON documents in Redshift without compromise. Why this matters:- Preserve entire JSON payloads instead of losing data to truncation
- Unlock the full power of Redshift’s semi-structured query capabilities with SUPER
- Handle large event logs, nested API responses, and other big JSON columns with ease
- Avoid manual workarounds or post-processing to recover lost information
August 14, 2025
Specifying Snowflake Roles
Some Snowflake service accounts are like Swiss Army knives — they have multiple roles, each with its own permissions and environment. Until now, Artie simply authenticated with the service account’s default role. That worked for straightforward setups, but for teams running multiple environments (like staging, pre-prod, and prod) from a single service account, it meant juggling credentials or sticking to a one-size-fits-all role. Artie now supports: Specifying Snowflake roles You can now tell Artie exactly which Snowflake role to use when authenticating with a service account..png?fit=max&auto=format&n=p5ltOkVRV7POEv2_&q=85&s=b8ded8385ac33192f38780924da86cba)
- Simplifies credential management — no more creating and rotating multiple service accounts for different environments
- Keeps environments isolated — staging stays staging, prod stays prod, even with the same account
- Supports better security practices — roles can be scoped to the exact permissions needed
- Reduces operational overhead — fewer accounts to configure, monitor, and maintain
August 12, 2025
Sub-Second Pipeline Deployment
When you’re rolling out dozens (or hundreds) of pipelines at once, every second counts. The old 3–5 second deploy time per pipeline worked fine for smaller updates — but for large-scale rollouts, those seconds piled up fast, sometimes even hitting Terraform’s execution timeouts. That meant splitting deployments into batches, manually tracking progress, and adding friction to what should’ve been a quick rollout. Artie now supports: Sub-second pipeline deployment Pipeline deployments now complete in under 0.5 seconds each. Whether you’re launching 10 pipelines or 100+, they’ll deploy in a fraction of the time — keeping Terraform applies well within limits and eliminating the need for batching or manual retries. Example: One customer managing 150+ pipelines can now spin up 40 new pipelines in under 20 seconds, with their largest rollouts finishing in minutes. Why this matters:- Keeps Terraform applies under execution limits — no more timeout failures
- Eliminates the need for splitting deployments into smaller batches
- Cuts large-scale rollout time to seconds/minutes
- Frees engineers from manual progress tracking and retries
August 7, 2025
Flush Metrics Now Available in Analytics
For teams replicating high-volume data into destinations like Snowflake, BigQuery, or Postgres, setting the right flush rules is key to balancing freshness, cost, and performance. But without visibility, tuning those rules can feel like guesswork. Artie now supports: Flush Metrics in the Analytics dashboard Flush rules let you control when data gets written from Artie’s streaming buffer to your destination — based on time intervals, row counts, or byte thresholds. With Flush Metrics, you now get a clear view into how those rules are performing. Take a healthtech team syncing MySQL to BigQuery: they’ve configured a 60-second or 1MB flush rule. Now, they can see exactly how often data flushes, what triggered it, and how long it took — helping them optimize for cost and latency without guessing.
- Tune pipelines with data — not guesswork
- Validate whether your flush rules are hitting SLAs
- Optimize for warehouse costs by spotting over-frequent writes
- Troubleshoot delays and fine-tune performance in seconds
August 5, 2025
New Destination: Postgres
Not every use case belongs in a warehouse. Teams often need to move transactional data into Postgres to power real-time APIs, partner-facing systems, or operational dashboards — without the complexity of Snowflake or the fragility of DIY CDC scripts. Until now, Artie’s destinations focused on analytics platforms. But operational systems matter too — and we’re making sure you’re covered. Artie now supports: Postgres as a destination You can now stream changes from your source databases directly into Postgres — just like any other Artie pipeline. It’s fully managed, fault-tolerant, and handles schema changes and backfills automatically. Some teams are already using it to power internal tools by syncing MySQL to Postgres — skipping the warehouse entirely. Others are using Postgres-to-Postgres replication to isolate production workloads or build live replicas for disaster recovery. Same reliability. New destination. Why this matters:- Power real-time APIs and dashboards without a warehouse
- Eliminate fragile CDC scripts with a fully managed solution
- Sync across Postgres instances to isolate workloads or support disaster recovery
- Handle schema changes and backfills automatically — no maintenance required
July 29, 2025
Self-Serve DynamoDB Backfills
Backfills are a critical step in onboarding new pipelines — especially when you’re working with historical data in DynamoDB. Until now, Artie handled that part for you, kicking off a table export behind the scenes. That worked fine — unless something broke. If your AWS role didn’t have the right permissions or something else went sideways, users were left guessing. Now, that guesswork is gone. Introducing: Self-Serve DynamoDB Backfills When setting up a DynamoDB pipeline, you’ll now see a guided flow in the UI that helps you kick off a backfill. You can export the table directly from your AWS account — or select an existing export if you’ve already started one manually.
- DynamoDB backfills are now fully transparent and user-controlled
- Errors (like missing permissions) are surfaced immediately for faster fixes
- Reuse recent exports — no need to start from scratch
- Smoother, more reliable onboarding for new DynamoDB pipelines
July 22, 2025
Parallel Segmented Backfills for Postgres
CTID-based backfills are fast and efficient — especially for large, append-only Postgres tables. They scan directly by physical row location, often outperforming logical queries in stable datasets. But CTIDs come with tradeoffs: they’re slow to initialize for large tables, fragile in dynamic tables where rows update or move, and they can time out in environments with aggressive statement_timeout settings. For teams working with massive, constantly changing Postgres tables, these limitations can stall backfill progress or create reliability risks. Artie now supports schema alignment across environments. Parallel Segmented Backfills offer an alternative path. Instead of relying on CTID, Artie slices tables into logical row segments based on integer primary keys — then parallelizes the work across those chunks. The result: similar performance to CTID backfills, but with stronger guarantees in dynamic environments. We recently helped a customer backfill 8 billion rows in an actively updated Postgres table. CTID-based scans kept timing out and drifting. With Parallel Segmented Backfill, we split the workload across logical row ranges and completed the job — no timeouts, no skipped rows, no guesswork. Why this matters:- Resilient to updates and vacuuming — row movement doesn’t break backfills
- Offers CTID-level performance with better reliability under load
- Avoids statement_timeout failures in large or busy tables
- Makes backfill behavior predictable and tunable
July 17, 2025
Improved DDL Support for Cell-Based Architectures
In environments with multiple isolated databases — like production, staging, and dev — schema drift is a persistent risk. Columns added in one cell might not appear in another unless there’s active data flowing through. That means teams looking at the “same” table in Snowflake could be seeing different structures, leading to confusion, bugs, and broken dashboards. This became especially painful for teams whose QA and Dev environments receive little to no traffic. With our previous behavior, tables wouldn’t update unless a row changed — leaving environments out of sync. Artie now supports schema alignment across environments. We’ve introduced a new opt-in job that automatically checks and syncs table schemas across environments — even when there are no row changes. If a new column shows up in production, it’ll get added to dev and staging too, so all environments stay aligned. This feature ensures you get consistent schemas, no matter how much (or little) traffic a database gets. Why this matters- Guarantees column consistency across environments (prod, staging, dev)
- Eliminates silent schema drift in low-traffic databases
- Supports cell-based and single-tenant architectures out of the box
- Reduces debugging time and improves trust in test environments
July 3, 2025
External Stage Support for Snowflake
Some teams need more control over where their data goes — and how it gets there. Maybe it’s for compliance. Maybe audit. Or maybe they just don’t want Snowflake touching their data until the very last step. By default, Artie loads delta files into Snowflake using internal staging before merging them into the target table. That worked fine for most workflows — but some teams need an extra layer of control over how data flows through their environment. Now: Artie supports external staging. You can configure Artie to write delta files to a Snowflake external stage — like your own S3 bucket — and we’ll read from there when applying changes to your target table. This gives organizations — like federal agencies using Snowflake Gov Cloud — the ability to use an external stage in their own environment, keeping data fully under their control for things like internal review, validation, or security scanning before deciding to merge into Snowflake. Same fully managed sync. Just with the files landing in your environment first. Why this matters ✅ You keep full visibility into what’s being staged before it’s merged✅ You can retain delta files for auditing or reprocessing — entirely on your terms
✅ You get tighter control over when data crosses trust boundaries There’s no impact on performance. No extra cost. Just more flexibility, when you need it. Want to turn this on? Let us know — we’ll help you get set up.
June 26, 2025
Column Control: Include, Exclude, Hash
Not every column needs to make it to your warehouse.Some fields are sensitive. Some are noisy. Some just don’t belong anywhere near analytics.
Now, you can decide exactly what gets replicated — and what doesn’t — with Artie’s expanded column-level controls. Here’s what’s now possible, per column:
✅ Inclusion — define an allowlist. Only replicate what you explicitly approve; otherwise, ignore
🚫 Exclusion — let most of the table through, but block the columns you don’t want downstream
🔐 Hashing — keep the structure, mask the value. Track fields like user IDs, without exposing data

Why this matters:
This isn’t just a cleanup job. It’s control over what leaves prod. Inclusion Sometimes, it’s not about removing sensitive fields — it’s about only sending the ones you trust. Inclusion rules flip the default: instead of replicating everything and hoping exclusions or hashes catch the risky stuff, you define exactly what gets through — and block the rest. What this means:- Safer by default — no surprises when new columns show up
- Compliance-friendly — ideal for PII and financial data
- Cleaner data — only the fields analytics and ML teams actually need
Exclusion rules let you drop what doesn’t belong — without touching your schema. Use it when:
- You’re skipping internal metadata, debug fields, or legacy junk
- You want to trim without breaking things
- You’re migrating slowly and need a guardrail, not a wall
Hashing keeps them in your pipeline without exposing what’s inside. Reach for hashing when:
- You want to track user behavior across systems without exposing identity
- You’re debugging and need to confirm values match across systems — without logging sensitive data
- You’re sharing a warehouse and want to prevent exposing raw PII to teams that don’t need it
- You only need to know whether or not a value has changed
Column-level rules are set at the source. This guide explains where they belong and why.
June 19, 2025
CDC for Tables Without Primary Keys
Some tables are weird. No primary key (PK), maybe just a unique index or some composite hack someone added in 2017. Until now, those were off-limits for replication. You can now override PK requirements by specifying a unique index — including composite indexes. Artie will respect the exact column order to ensure optimal performance. Why index-based PK overrides matter: Not every table has a clean PK. Some use unique indexes or composite keys that aren’t formally declared as PKs. Until now, these tables were difficult (or impossible) to replicate. This change addresses one of the most common blockers for CDC at scale. What’s changed:- PK override: Define row identity with a unique index
- Use composite keys — even if unofficial or unenforced
- Preserve the exact index column order – it affects how changes are captured and impacts query performance during replication (e.g.,
email
,account_id
,created_at
)
- Your table lacks a formal PK, but has a unique constraint or index
- You rely on composite keys to identify rows
- You’re dealing with legacy systems or data models that weren’t built with CDC in mind
June 17, 2025
Backfill Tuning: Picking the Right Batch Size
You can now control how many rows Artie processes at a time during backfills. The default is now 25,000 rows per chunk (up from 5,000), but you can tune this based on performance vs. load tradeoffs. Why backfill batch size matters: Backfills aren’t one-size-fits-all. Some teams want speed. Others are sensitive to database load and tiptoeing around a production DB at 2am. Until now, everyone got the same batch size of 5,000 rows per chunk. Now you can tune backfills to match your style:- The default is 25,000 rows — we benchmarked a bunch of sizes. 25,000 rows won out. So that’s our new default
- You have control — adjust the batch size to fit your environment

- Speed up backfills: Larger chunks = fewer queries, can improve throughput, but overly large chunks can backfire. It’s about finding balance.
- Reduce DB load: Smaller chunks = faster queries, lower impact on source
Read Once and Write to Multiple Destinations
You can now sync data from a single database to multiple destinations — all from the same connector. Why this matters:- Reduce load on production databases by avoiding duplicate reads and minimizing replication slot overhead
- Ability to fan out to multiple tools — e.g., write to both Snowflake and Redshift
- Ability to support diverse use cases in parallel — analytics, ML, real-time alerting
- Operate across multiple data platforms
- Serve many internal teams with different tools
- Need to scale data infrastructure without increasing operational burden
June 2, 2025
Iceberg Support Using S3 Tables
This launch adds something big: support for Apache Iceberg using S3 Tables. Artie customers can now:- Stream high-volume datasets into Iceberg-backed tables stored on S3
- Use S3 Tables’ fully managed catalog, compaction, and snapshot management
- Query efficiently with Spark SQL (via EMR + Apache Livy) without wrestling with cluster glue
- Get up to 3x faster query performance thanks to automatic background compaction

May 14, 2025
S3 Iceberg destination (Beta)
S3 Iceberg is now available in beta! This new destination uses AWS’s recently released S3 Tables support, allowing you to replicate directly into Apache Iceberg tables backed by S3. It’s a big unlock for teams building modern lakehouse architectures on open standards.Column Inclusion Rules
You can now define an explicit allowlist of columns to replicate - ideal for PII or other sensitive data. This expands our column-level controls alongside column exclusion and hashing. Only the fields you specify get replicated. Everything else stays out.Autopilot for New Tables
Stop manually hunting for new tables in your source DB. Autopilot finds and syncs them for you - zero config required. Turn it on via: Deployment → Destination Settings → Advanced Settings → “Auto-replicate new tables”