Loading Data

Google Pub/Sub

Stream OpenSnowcat event data into Google Pub/Sub.

A scalable, real-time pipeline for delivering schema-validated behavioral data into Google Cloud Pub/Sub.

The SnowcatCloud Loader publishes enriched events to Pub/Sub topics with low latency and high reliability. Events are validated, enriched, and streamed in TSV or flattened JSON format — ready for consumption by Dataflow, BigQuery, or any GCP-native service. With built-in retries, batching controls, and native observability, it's the fastest way to bring Snowplow/OpenSnowcat data into the Google Cloud ecosystem.

No ingestion logic, no infrastructure to manage — just clean data streaming straight to your GCP services.

Features

  • Real-Time Delivery
    Stream enriched event data to Google Cloud Pub/Sub with low latency for near-instant processing.
  • Format Flexibility
    Supports TSV and enriched flattened JSON formats — compatible with Dataflow, BigQuery, and custom consumers.
  • Buffer & Flush Control
    Tune flush intervals and batch sizes to balance delivery speed and cost efficiency.
  • Reliable Delivery & Retry
    Built-in retry mechanisms ensure delivery even during transient GCP service disruptions.
  • Native Monitoring
    CloudWatch Metrics & Logs available; integrates with GCP observability tools depending on deployment.

Loader Configuration

Define how and where your data is published using a simple configuration block:

output {
  service: "pubsub", 
  project_id: "project-id",
  topic: "enriched-tsv",
  credentials_path: "/path/gcp-credentials.json",
  format: "TSV" // OR JSON
},