Loading Data

AWS Firehose

Using AWS Firehose to load data to data warehouses, data lakes, and analytics services.

Amazon Data Firehose offers the simplest method to quickly acquire, transform, and deliver data streams to data lakes, data warehouses, and analytics services without running and managing a Snowplow RDB loader.

Destination S3

Choose source and destination

  • Source: Amazon Kinesis Data Streams
  • Destination: Amazon S3

Source settings

  • Kinesis data stream: arn:aws:kinesis:us-west-2:xxxxxxxxxxxxx:stream/enriched-good-events
  • Firehose stream name: KDS-S3-TngAd

Destination Settings

  • S3 Bucket: s3://enriched-json
  • New line delimiter Enabled
  • Dynamic partitioning (Optional) Dynamic partitioning enables you to create targeted data sets by partitioning streaming S3 data based on partitioning keys.
  • Multi record deaggregation: Enabled
  • Multi record deaggregation type: JSON
  • Inline parsing for JSON: Not enabled
  • S3 Bucket Prefix: good/!{partitionKeyFromQuery:app_id}/ (this would partition by app_id with prefix good)
  • S3 Bucket error output prefix: error= (Optional)

S3 buffer hints

  • Buffer size: 128 MiB
  • Buffer interval: 300 seconds

Compression for data records

  • GZIP