Artie replicates operational data into warehouses and lakes - reliably, in real time, and without the engineering overhead most pipelines come with. In this post, we’re breaking down the real differences between Snowpipe and Snowpipe Streaming - what they do well, where they fall short, and how to pick the right one.
Note: This post is mostly technical and product-agnostic, but we’ll briefly explain how Artie fits into the picture at the end.
Getting Data Into Snowflake
If you're pushing data into Snowflake, you've probably heard of Snowpipe. Maybe Snowpipe Streaming too. They sound similar and even serve a similar purpose: continuously loading data into Snowflake. But they work very differently.
Knowing when to use one over the other can save you a lot of time, debugging effort, and compute credits. So let’s break down how each one works, where they shine, and where they fall short. And yeah, we’ll talk about what happens when you need UPSERTs
, schema changes, or real-time guarantees.
Snowpipe
Snowpipe’s your go-to when your data shows up in files and you just want Snowflake to grab them. You drop a file (CSV, JSON, Parquet, etc.) into a cloud bucket like S3 or GCS or Azure Blob, and Snowflake picks it up automatically using COPY INTO
behind the scenes.
It’s serverless and scales on its own, which is great. But you still have to wire up cloud events (like S3 + SNS/SQS) to tell Snowflake when files arrive.
And pricing? You’re billed per file + file size. If you’re dropping thousands of tiny files, that gets expensive fast.
Pros
- No infra to manage
- Auto-scales and runs serverless
- Great for batch-style ingestion (e.g. daily files)
- Pretty simple architecture - just drop files in a stage and Snowpipe handles the load
Cons
- 1-5 minute latency from file drop to table load
- No
UPSERT
support - Snowpipe is an append-only operation and you’ll need to implement downstream processing for deduplication or updates - Requires setting up cloud event notifications (e.g. S3 + SNS/SQS, GCS + Pub/Sub)
- Snowpipe is designed to handle files, not individual events
- No explicit ordering guarantees across files
- Snowpipe deduplicates at the file level by filename, so if you re-upload a file with the same name, it will be skipped. Remember to use unique filenames to ensure proper ingestion and avoid silent data loss
- Schema evolution is limited (no renames, no drops) but you can configure it to auto-add some columns
DML and DDL Handling
Best practice for using Snowpipe
Land raw data in a staging table first. Then handle dedupes and UPSERTs
downstream using MERGE
, Streams & Tasks, or Dynamic Tables.
If you want Snowflake to auto-add new columns, you’ll need to turn on ENABLE_SCHEMA_EVOLUTION = TRUE
and use MATCH_BY_COLUMN_NAME
. Otherwise, extra columns in your files will just get ignored.
Stick to JSON or Parquet if you can - Snowflake does a better job inferring types with those. And if you're loading files, batch them. Too many tiny files = extra overhead.
Format Quirks (CSV vs. JSON vs. Parquet)
CSV is the easiest to generate - but the hardest to work with. Every field is loaded as a string unless you manually cast it. Schema evolution is also more fragile with CSV. JSON and Parquet are better choices: they support richer type inference and cleaner schema evolution.
Pro tip: avoid mixed-type columns in JSON (e.g. 123 in one row, "abc" in another). Snowflake will treat the column as VARIANT
, which makes downstream modeling harder.
Snowpipe Streaming
Snowpipe Streaming skips the whole file-drop dance. Instead of waiting for files to land in S3, you push rows straight into Snowflake using their SDK or Kafka connector. It’s built for low-latency use cases.
Here’s the catch: you’re the one doing the pushing. You’ll need to read data from the source, then call the Snowflake Ingest API to insert it. Snowflake takes care of the backend infra, but you're on the hook for running the producer - and you're billed per row.
Pros
- Lower latency since there is no staging delay
- No need to manage cloud storage or files
- Exactly-once semantics (per stream channel)
- Ideal for Kafka, Flink, or custom producers
Cons
- You have to build and run a producer (or use a tool that does this)
INSERT
-only; no native support forUPSERTs
orDELETs
-UPSERTs
must be handled post-ingestion using Snowflake Streams & Tasks, Dynamic Tables, or scheduledMERGE
statements- Schema changes must be handled manually unless you're using Kafka with a schema registry
- Operational complexity: you own the producer, retries, logging, and monitoring
Handling Failures in Snowpipe Streaming
If your producer crashes mid-ingestion, Snowflake won’t recover the data for you. Exactly-once delivery relies on your app tracking offsets and retrying intelligently. For Kafka, this usually means persisting the last committed offset and making ingestion idempotent - otherwise you risk data inconsistencies.
There’s no built-in retry mechanism in the Snowflake SDK. You’ll need to implement backoff logic and error handling yourself. It works - but only if you’re disciplined.
DML and DDL handling
Best Practice for Using Snowpipe Streaming
Start by streaming into a raw table with a flexible schema - think: mostly strings, all columns nullable. That way, you won’t break the pipeline when something unexpected shows up.
Then clean it up later. Do your transformations in a second step (classic ELT). It gives you breathing room and keeps ingestion stable.
TL;DR
Snowpipe is best suited for file-based batch loads where minimal setup is important and real-time latency isn’t a strict requirement. It works well when your data lands in cloud storage periodically, like hourly or daily.
Snowpipe Streaming offers lower latency and is designed for streaming data in near real-time. But with that speed comes complexity - you’ll need to manage a producer application, handle retries, ensure row ordering, and deal with schema changes manually.
Neither tool does UPSERTs
. They're mostly INSERT
-only. If you want to deduplicate, UPDATE
, or DELETE
rows, you'll need to roll your own logic downstream.
Same for schema changes. Snowpipe supports optional column adds (with config). Snowpipe Streaming won't evolve schemas unless you're using the Kafka connector + Schema Registry (and even then, only certain formats are supported).
Artie: Real-Time Pipelines, Minus the Headaches
Sometimes you don’t want to glue together 5 Snowflake features to make updates work. You just want your database changes - INSERT
, UPDATE
, DELETE
- to land in Snowflake in the right order, in real time, without breaking when a column gets renamed.
We built Artie for the folks who are tired of babysitting pipelines. We help get data into Snowflake fast and with zero engineering overhead.
If you're tired of:
- Append-only limitations
- Hand-rolled
MERGE
logic - Breakage from schema changes
...Artie might be what you're really looking for.
If you’re interested, here's how all three options stack up, side by side.