A few months ago we were adding MotherDuck as a destination, and naturally our first implementation used DuckDB’s Go driver directly.
It worked perfectly on my machine.
Then we wired it into our production transfer service - and everything fell apart.
At Artie, we build real-time data replication systems that have to run reliably across a wide range of customer environments. That means we care a lot about boring things: predictable builds, clean cross-compilation, small containers, and deployment paths that don’t surprise us six months later. If a dependency makes those things harder, it’s usually a non-starter — even if the technology itself is great.
This post walks through what went wrong, why we decided to avoid CGO entirely, and how that led to ducktape, a small open-source service that wraps DuckDB’s Appender API behind HTTP/2 streams.
The appeal of DuckDB’s Appender API
DuckDB’s Appender API is great for ingestion-heavy workloads:
- It bypasses SQL parsing overhead
- It provides predictable performance for bulk inserts
- It’s a natural fit for streaming data into analytical tables
When targeting MotherDuck, using DuckDB directly felt like the most straightforward approach. The Go driver exposed the Appender API cleanly, and early tests showed excellent throughput.
So far, so good.
Where things started breaking: CGO
The problem wasn’t DuckDB - it was CGO.
DuckDB’s Go driver requires CGO, which immediately conflicted with how we build and deploy services:
- We cross-compile to amd64 and arm64
- We rely on static binaries
- We want small, predictable Docker images
- We want CI builds that don’t depend on system-level C toolchains
Once CGO entered the picture:
- Cross-compilation started failing
- Static builds were no longer trivial
- Docker images had to include compilers and system libraries
We tried isolating the CGO-dependent code into a separate module, but that didn’t fully solve the problem. CGO still leaked into the build graph, caused CI failures, and forced us to restructure parts of our pipeline just to accommodate a single dependency.
At that point it became clear: we didn’t want CGO anywhere near our main service.
None of these issues are unsolvable in isolation. The problem is that they compound over time, and they pull build and deployment complexity into places where we really don’t want it — especially in a system that’s supposed to be invisible and reliable.
Reframing the problem
The core question became:
How do we keep using DuckDB’s Appender API without pulling CGO into our main codebase?
Instead of fighting the build system harder, we changed the architecture.
If CGO was unavoidable, we could contain it.
That led to a simple idea:
- Run DuckDB in a small standalone service
- Expose the Appender API over the network
- Keep the main system pure Go
Introducing ducktape
ducktape is a tiny service that wraps DuckDB’s Appender API behind HTTP/2 streams.
This is a pattern we use a lot at Artie: if a piece of complexity is unavoidable, we isolate it aggressively and keep the rest of the system simple.
The model is simple:
- Clients stream NDJSON over HTTP/2
- ducktape parses the stream and appends rows using DuckDB’s Appender
- DuckDB runs locally inside the ducktape process
- The client never touches CGO
From the perspective of the main service, this turns DuckDB ingestion into a normal streaming network call.
The result:
- No CGO in the main binary
- Cross-compilation works again
- Static builds are preserved
- Docker images stay minimal
ducktape is intentionally small and narrowly scoped. It does one thing: efficiently append data into DuckDB over a streaming interface.
Why HTTP/2 and NDJSON?
We optimized for simplicity and debuggability over theoretical optimality.
- HTTP/2 gives us:
- Multiplexed streams
- Flow control
- Widely supported tooling
- NDJSON is:
- Easy to generate
- Easy to inspect
- Good enough for row-based ingestion
We considered alternatives like gRPC, Arrow, or custom binary protocols, but for this use case they added complexity without clear wins.
Performance results
The obvious concern with this approach is overhead.
In benchmarks, the results were better than expected:
- In-process Appender: ~848 MiB/sec
- ducktape over HTTP/2: ~757 MiB/sec
That’s roughly 90% of native performance, even though the data crosses a process boundary and travels over HTTP/2.
For our use case, that tradeoff was more than acceptable given the operational simplicity we gained.
When this approach makes sense (and when it doesn’t)
This design isn’t universal advice. It makes sense if:
- You strongly care about pure-Go builds
- CGO complicates your deployment model
- You’re already comfortable running small sidecar-style services
- You want to keep your core system simple
It probably doesn’t make sense if:
- You control the full runtime environment
- CGO isn’t a problem for your build pipeline
- Absolute peak performance matters more than operational ergonomics
Key takeaways
This bias toward boring, predictable systems shows up everywhere in how we build Artie - from how we ingest database changes to how we deploy and operate our services. Our goal is to make data replication something teams don’t have to think about day-to-day, even as it scales and gets more complex under the hood.
- CGO is rarely “just a build flag.” Once it enters a Go codebase, it tends to leak into CI, cross-compilation, Docker images, and long-term maintenance.
- Isolating complexity beats fighting it. Instead of bending our build system around CGO, we moved CGO behind a service boundary and kept the core system boring.
- Process boundaries don’t have to kill performance. With streaming ingestion and careful batching, we were able to retain ~90% of native DuckDB Appender throughput.
- Operational simplicity is a feature. Predictable builds, small binaries, and clean deployment paths matter just as much as raw throughput in production systems.
What’s next
There’s still plenty of room to explore:
- Compression strategies for the stream
- Smarter batching heuristics
- Alternative encodings like Arrow
- Backpressure tuning for high-concurrency workloads
For now, ducktape does exactly what we needed - no more, no less.
Open source
ducktape is open source and MIT licensed:
👉 https://github.com/artie-labs/ducktape
If you’ve dealt with CGO isolation differently, or have ideas for pushing performance further, we’d love to hear them.




