Skip to content

Latest commit

 

History

History
64 lines (55 loc) · 3.51 KB

File metadata and controls

64 lines (55 loc) · 3.51 KB

Kinesis

  • Is a scalable streaming service, designed to ingest lots of data
  • Producers send data into a Kinesis stream, the stream being the basic entity of Kinesis
  • Streams can scale from low to near infinite data rates
  • It is a public service and it is highly available in a region by design
  • Persistence: streams store by default a 24H moving window of data
  • Kinesis include storage to be able to ingest and retain it for 24H by default (can be increased to 365 days at additional cost)
  • Multiple consumers can access the data from that moving window

Kinesis Data Streams

  • Kineses Data Streams are using shards architecture for scaling, initially there is one shard, additional shards can be added over time to increase performance
  • Each shard provides its own capacity, each shard has 1MB/s ingestion capacity, 2MB/s consumption capacity
  • Shards directly affect the price of the Kinesis stream, we have to pay for each shard
  • Pricing is also affected by the length of the storage window. By default is 24H, it can be increased to 365 days
  • Data is stored in Kinesis Data Records (1MB), these records are distributed across shards

SQS vs Kinesis Data Streams

  • Is it about ingestion (Kinesis) of data or about decoupling, worker pools (SQS)
  • SQS usually has 1 production group, 1 consumption group
  • SQS is designed for decoupling and asynchronous communication
  • SQS does not have the concept of persistence, no window for persistence
  • Kinesis is designed for huge scale ingestion, having multiple consumers with different rate of consumption
  • Kinesis is recommended for ingestion, analytics, monitoring, click streams

Kinesis Data Firehose

  • Used to provide data ingestion for other AWS services such as S3
  • Fully managed service used to load data for data lakes, data stores and analytics services
  • Data Firehose scales automatically, it is serverless and resilient
  • It is not a real time product, it is a Near Real Time product with a deliver product of ~60 seconds
  • Supports transformation of data on the fly using Lambda. This transformation can add latency
  • Firehose is a pay as you go service, we pay per volume of data
  • Firehose supported destinations:
    • HTTP endpoints (third party providers)
    • Splunk (has direct support for it)
    • RedShift
    • OpenSearch (ElasticSearch)
    • S3
  • Firehose can accept data directly from producers or from Kinesis Data Streams
  • Firehose receives the data in real-time, but the ingestion is buffered
  • Firehose buffer by default waits for 1MB of data in 60 seconds before delivering to consumer. For higher load, it will deliver every time there is an 1MB chunk of data
  • Data is sent directly form Firehose to destination, exception being Redshift, where data is stored in an intermediary S3 bucket
  • Firehose use cases:
    • Persistence for data coming into Kinesis Data Streams
    • Storing data in a different format (ETL)

Kinesis Data Analytics

  • It is a real-time data processing product using SQL
  • The product ingests data from Kinesis Data Streams or Firehose
  • After the data is processed, it can be sent directly to destinations such as:
    • Firehose (data becoming near-real time)
    • Kinesis Data Streams
    • AWS Lambda
  • Kinesis Data Analytics architecture: Kinesis Data Analytics architecture
  • Kinesis Data Analytics use cases:
    • Anything using stream data which needs real-time SQL processing
    • Time-series analytics: election data, e-sports
    • Real-time dashboards: leader boards for games
    • Real-time metrics