Snowpipe Streaming
In July 2023, Snowflake announced Snowpipe Streaming, its new continuous data loading offering. This streaming API writes rows of data directly to Snowflake tables with lower load latencies, prepared to load any volume of data, and aims to be a powerful approach for handling near real-time data streams.
Snowpipe Streaming API
The Snowpipe Streaming API allows streaming data rows using the Snowflake Ingest SDK within client-managed application code (such as Java). The SDK makes REST API calls to Snowflake to write data directly to Snowflake tables, unlike Snowpipe, which writes data from temporary staged files.
Any custom application must be capable of processing data, handling encountered errors, ensuring continuous operation, and recovering from failures. With this API, it is now possible to interact directly with the Snowflake database in near real-time, avoiding intermediate steps or tools. If your application already interacts with a messaging system, you can adjust your solution to also write to Snowflake.
Snowflake Connector for Kafka
The Snowflake Connector for Kafka, integrated with Snowflake Streaming, uses the Kafka Connect framework designed for connecting Kafka with external systems like databases. The Snowflake Kafka Connector includes the Snowflake Ingest SDK and supports streaming rows from Apache Kafka topics directly into target tables.
This option allows you to add a decoupled component (Kafka Connect cluster) to your solution, either fully managed or not, that retrieves data from Kafka topics and pushes them to Snowflake or other targets without needing to change your existing code.
Snowflake’s Continuous Data Loading Offerings
Snowflake now has three continuous data loading options:
- Snowpipe: Reads from staged files.
- Snowpipe Streaming: Integrates with client code (APIs) to push data.
- Snowflake Connector for Kafka: Uses Kafka Connector to push data.
By adding Snowflake Dynamic Tables to the equation, you can maintain updated tables without the need for extra scheduled loading processes, as the tables are automatically updated.
Thanks for your time,
Pedro Duran