Postgres Replication Slot 101: How to Capture CDC Without Breaking Production

Ryan Choi

Updated on

July 31, 2025

Data know-how

Artie is a fully-managed CDC streaming platform that replicates database changes into warehouses and lakes without the complexity of DIY pipelines. It delivers production-grade reliability without ongoing maintenance.

TL;DR

Replication slots are essential for Change Data Capture (CDC) in Postgres. They act as bookmarks in the Write-Ahead Log (WAL), tracking exactly where a CDC client left off. But if left unchecked, they can wreak havoc in production by filling up storage, bringing down your database, or causing replication to fail. This blog covers what replication slots are, why they matter, how to use them safely, and what to watch out for in production.

What Is a Replication Slot?

A replication slot is a mechanism that tracks a replication client's position in Postgres' Write-Ahead Log (WAL). It doesn't live in the WAL — rather, it points to a specific location in the WAL, so Postgres knows how much of it to retain. This ensures that even if your CDC tool (like Artie) disconnects temporarily, it can resume from where it left off without losing any changes.

A replication client here refers to any tool or process that reads changes from Postgres via logical replication. This could be a CDC platform like Artie, Debezium, or AWS DMS, or even a custom-built application streaming data into Kafka or a warehouse.

Each slot is tied to a specific Log Sequence Number (LSN) and can only be used by one client at a time. LSN represents a specific position in the WAL — kind of like a byte offset or timestamp. As changes are written to the WAL, the LSN advances. Replication clients use LSNs to track how far they've read, and replication slots store that position.

Without a replication slot, there's no guarantee that your downstream system receives the full change history — if your client disconnects, Postgres might purge WAL files before you can catch up.

What Is WAL?

WAL is Postgres' way of ensuring data integrity. Before any changes hit the actual database files, they’re first written to WAL. This enables ACID compliance and lets you recover from crashes or replicate data elsewhere.

For logical replication, the client reads changes from WAL. But here’s the catch: Postgres won’t delete WAL files until all replication slots have advanced past them. That means a stalled or idle slot can cause WAL to pile up, potentially eating all your disk space. If you’re running Postgres on a managed service like RDS or Aurora, check out this deeper dive on WAL growth.

Production Gotchas

Running replication slots in production? Here are four critical things to watch:

Never leave a slot idle: If a slot isn’t actively being read, you should remove it as it will become a ticking time bomb. Postgres will keep WAL files indefinitely to preserve data for that slot, eating up storage.
‍
Monitor replication slot size: Slot size can balloon if WAL is accumulating faster than you are consuming. Use this query to check:

SELECT slot_name, wal_status, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal, active, restart_lsn FROM pg_replication_slots;
‍
Set a statement_timeout: Long-running transactions can block WAL from being purged. WAL won’t be deleted if either:
1. a replication slot still references it
2. or an open transaction hasn’t been committed
  ‍
A statement_timeout can help prevent this by killing transactions that overstay their welcome. We recommend setting it to 30s – 5min, depending on your workload. For most analytics and CDC use cases, 60 seconds is a safe default.
‍
Set max_slot_wal_keep_size: This config forces Postgres to purge WAL if it gets too large. Think of it as an emergency kill switch. Set it high enough to prevent accidental purging. We've seen 25GB work well for smaller workloads. For larger databases (e.g., 1TB+), setting this to 50–100GB+ is safer — especially if you're ingesting high-volume event data or regularly see long-running transactions.

What’s New in Postgres 16 and 17

Postgres 16 (self-managed & RDS PostgreSQL): You can now create replication slots on standby replicas — reducing load on the primary. Note: this is not supported on Aurora PostgreSQL — you must still create replication slots on the writer node.
Postgres 17: Supports failover for replication slots. This builds on PG16's functionality, ensuring that slots persist through failovers. Aurora users already benefit from this — logical replication slots are preserved during Aurora failovers.

Upgrading Postgres Safely

Minor upgrades: No changes needed. CDC will keep working.
Major upgrades: You may need to drop and recreate the slot. Artie’s guide covers how to upgrade without data loss. Bonus: schedule with us ahead of time and we’ll help!

Useful Queries

-- List existing replication slots
SELECT * FROM pg_replication_slots;‍

-- See size of replication slot (how much WAL it's retaining)
SELECT slot_name, wal_status, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal, active, restart_lsn FROM pg_replication_slots;‍

-- Create a logical replication slot
SELECT * FROM pg_create_logical_replication_slot('artie', 'pgoutput');‍

-- Drop a replication slot
SELECT pg_drop_replication_slot('artie');‍

-- Set statement_timeout (e.g., to 60 seconds)
SET statement_timeout = '60s';‍

-- Set max_slot_wal_keep_size (e.g., to 50GB)
ALTER SYSTEM SET max_slot_wal_keep_size = '50GB';‍

Why This Matters

Replication slots are powerful but dangerous if misunderstood. They’re the backbone of Postgres CDC — yet a stalled or bloated slot can take down your system. If you’re running logical replication in production, you need to:

Actively monitor slot activity and size
Set WAL management configs
Know when (and how) to safely recreate slots

Artie handles all of this for you. We monitor slot health, optimize WAL settings, and ensure your pipeline doesn’t silently break. Our goal? Let you focus on building, not babysitting replication.

‍

Curious if your replication slot setup is healthy? Reach out to the Artie team. We’ll help you debug, optimize, and avoid weekend wake-up calls.

AUTHOR

Ryan Choi