Skip to content

Amazon Kinesis

Amazon Kinesis Overview

  • Definition: Amazon Kinesis is a fully managed platform for collecting, processing, and analyzing real-time streaming data at scale, supporting use cases like log analysis, IoT, and real-time analytics.
  • Key Services:
    • Kinesis Data Streams: Ingest and store streaming data for custom processing.
    • Kinesis Data Firehose: Load streaming data into data stores (e.g., S3, Redshift) with transformations.
    • Kinesis Data Analytics: Analyze streaming data using SQL or Apache Flink.
    • Kinesis Video Streams: Ingest and process video streams for analytics or playback.
  • Key Features:
    • Scales to handle terabytes of data per hour.
    • Integrates with Lambda, S3, Redshift, Elasticsearch, and more.
    • Provides serverless or managed options with low-latency processing.
    • Supports monitoring via CloudWatch and security via IAM and KMS.
  • Use Cases: Real-time analytics (e.g., clickstream, logs), IoT data processing, video surveillance, event-driven architectures.
  • Key Updates (2024–2025):
    • Enhanced Kinesis Data Streams: Improved throughput and fan-out (October 2024).
    • Kinesis Data Analytics: Apache Flink performance optimizations (March 2024).
    • FIPS 140-2 Compliance: Enhanced for GovCloud (October 2024).
    • Security Hub Integration: Compliance monitoring for stream configurations (January 2025).

1. Kinesis Core Concepts

Kinesis Services

  • Kinesis Data Streams:
    • Real-time data ingestion and storage (default 24 hours, up to 365 days).
    • Uses shards to scale throughput (1 MB/s write, 2 MB/s read per shard).
    • Consumers (e.g., Lambda, EC2) process data via APIs or KCL (Kinesis Client Library).
    • Explanation: E.g., ingest clickstream data for real-time dashboards.
  • Kinesis Data Firehose:
    • Delivers streaming data to destinations (S3, Redshift, Elasticsearch, Splunk).
    • Supports transformations via Lambda and format conversion (e.g., JSON to Parquet).
    • Buffers data for cost-efficient delivery (size/time-based).
    • Explanation: E.g., load logs to S3 with Lambda transformation.
  • Kinesis Data Analytics:
    • Analyzes streaming data using SQL or Apache Flink.
    • Outputs results to S3, Redshift, or other streams.
    • Explanation: E.g., compute average click rate every 10 seconds.
  • Kinesis Video Streams:
    • Ingests, stores, and processes video streams (e.g., from cameras).
    • Supports playback and integration with Rekognition for analytics.
    • Explanation: E.g., analyze surveillance video for motion detection.

Key Concepts

  • Shards (Data Streams):
    • Unit of throughput (1 MB/s write, 1,000 records/s, 2 MB/s read).
    • Data partitioned across shards using partition keys.
    • Explanation: E.g., 10 shards handle 10 MB/s write.
  • Records:
    • Data unit in streams (data blob + partition key + sequence number).
    • Max 1 MB per record.
    • Explanation: E.g., JSON log entry as a record.
  • Consumers:
    • Applications reading from streams (e.g., Lambda, KCL, SDK).
    • Standard Consumer: Polls shards (up to 2 MB/s).
    • Enhanced Fan-Out: Pushes data to consumers (faster, dedicated throughput).
    • Explanation: E.g., Lambda consumer processes clickstream.
  • Partition Key:
    • Determines shard assignment for records.
    • Explanation: E.g., use user_id to distribute records.
  • Retention Period (Data Streams):
    • Default 24 hours; extendable to 7 days or 365 days (long-term retention).
    • Explanation: E.g., retain logs for 7 days for auditing.
  • Buffering (Firehose):
    • Buffers data by size (1–128 MB) or time (60–900 seconds) before delivery.
    • Explanation: E.g., buffer 5 MB or 300 seconds to S3.
  • Kinesis Client Library (KCL):
    • Simplifies building scalable consumer applications.
    • Manages shard distribution and checkpointing.
    • Explanation: E.g., KCL app processes 100 shards.

Key Notes:

  • Exam Relevance: Understand Data Streams, Firehose, Analytics, shards, and integrations.
  • Mastery Tip: Compare Kinesis vs. SNS/SQS vs. MSK for streaming.

2. Kinesis Performance Features

Kinesis optimizes real-time processing.

Low Latency

  • Purpose: Near-real-time data handling.
  • Features:
    • Data Streams: Millisecond latency for ingestion.
    • Enhanced fan-out reduces consumer latency (2024).
    • Firehose: Near-real-time delivery with buffering.
    • Analytics: Sub-second SQL/Flink processing.
  • Explanation: E.g., process clickstream in <100 ms.
  • Exam Tip: Highlight low latency for real-time apps.

High Throughput

  • Purpose: Handle large data volumes.
  • Features:
    • Data Streams scale with shards (e.g., 1,000 shards = 1 GB/s).
    • Firehose processes terabytes/hour.
    • Analytics handles high-velocity streams.
  • Explanation: E.g., ingest 10 GB/s of IoT data.
  • Exam Tip: Use for high-volume streaming.

Scalability

  • Purpose: Support growing workloads.
  • Features:
    • Data Streams: Add/remove shards dynamically.
    • Firehose: Auto-scales delivery to destinations.
    • Analytics: Scales Flink tasks automatically.
    • Enhanced throughput for Data Streams (2024).
  • Explanation: E.g., scale from 10 to 100 shards for peak traffic.
  • Exam Tip: Emphasize shard scaling and serverless.

Key Notes:

  • Performance: Low latency + high throughput + scalability = efficient streaming.
  • Exam Tip: Optimize with shards and fan-out.

3. Kinesis Resilience Features

Resilience ensures reliable streaming.

Multi-AZ/Region Redundancy

  • Purpose: Survive failures.
  • Features:
    • Kinesis services are Regional with multi-AZ replication.
    • Data Streams replicate data across AZs.
    • Firehose buffers data durably before delivery.
  • Explanation: E.g., stream continues if us-east-1a fails.
  • Exam Tip: Highlight multi-AZ for HA.

Continuous Streaming:

  • Purpose: Uninterrupted data flow.
  • Features:
    • Automatic retries for failed writes/reads.
    • Firehose retries failed deliveries.
    • Analytics persists state for Flink apps.
  • Explanation: E.g., retry failed S3 delivery in Firehose.
  • Exam Tip: Use for 24/7 streaming.

Monitoring and Recovery:

  • Purpose: Detect and resolve issues.
  • Features:
    • CloudWatch metrics (e.g., PutRecord.Success, GetRecords.Latency).
    • CloudTrail logs API calls (e.g., PutRecords).
    • Security Hub detects misconfigured streams (2025).
    • KCL checkpoints progress for recovery.
  • Explanation: E.g., alarm on high WriteProvisionedThroughputExceeded.
  • Exam Tip: Use CloudWatch and CloudTrail for monitoring.

Data Durability:

  • Purpose: Protect streaming data.
  • Features:
    • Data Streams: Multi-AZ replication, retention up to 365 days.
    • Firehose: Durable buffering before delivery.
    • Explanation: E.g., recover 7-day-old records from stream.
  • Exam Tip: Highlight retention for durability.

Key Notes:

  • Resilience: Multi-AZ + retries + monitoring + retention = reliable streaming.
  • Exam Tip: Design resilient pipelines with retention and CloudWatch.

4. Kinesis Security Features

Security is a core focus for Kinesis in SAA-C03.

Access Control

  • IAM Policies:
    • Restrict Kinesis actions (kinesis:PutRecord, kinesis:GetRecords).
    • Scope to streams, consumers, or roles.
    • Example: {"Effect": "Allow", "Action": "kinesis:PutRecord", "Resource": "arn:aws:kinesis:::stream/clickstream"}.
  • Resource Policies:
    • Control cross-account access to streams.
    • Explanation: E.g., allow partner account to read stream.
  • Exam Tip: Practice IAM policies for producers/consumers.

Encryption

  • In Transit:
    • HTTPS for API calls and data transfer.
    • Explanation: E.g., secure PutRecords call.
  • At Rest:
    • Data Streams: Server-side encryption with KMS (default or custom keys).
    • Firehose: Encrypts buffered data with KMS.
    • Analytics: Encrypts output with KMS.
    • Explanation: E.g., KMS-encrypted clickstream data.
  • Exam Tip: Highlight KMS for compliance.

Compliance:

  • Purpose: Meet regulatory standards.
  • Features:
    • Supports HIPAA, PCI, SOC, ISO, GDPR, FIPS 140-2 (GovCloud).
    • Security Hub detects non-compliant streams (2025).
  • Explanation: E.g., process HIPAA-compliant IoT data.
  • Exam Tip: Use KMS and Security Hub for compliance.

Auditing:

  • Purpose: Track stream activity.
  • Features:
    • CloudTrail logs API calls.
    • CloudWatch Logs for consumer/application logs.
    • Security Hub monitors compliance (2025).
    • Explanation: E.g., audit PutRecords for unauthorized access.
  • Exam Tip: Use CloudTrail for auditing.

Key Notes:

  • Security: IAM + encryption + compliance + auditing = secure streaming.
  • Exam Tip: Configure KMS, IAM, and CloudTrail for secure Kinesis.

5. Kinesis Cost Optimization

Cost efficiency is a key exam domain.

Pricing

  • Kinesis Data Streams:
    • Shard Hour: $0.015/hour/shard.
    • PUT Payload Unit: $0.014/1M units (25 KB/unit).
    • Extended Retention: $0.02/GB/month (7 days), $0.05/GB/month (365 days).
    • Enhanced Fan-Out: $0.013/consumer-shard-hour.
  • Kinesis Data Firehose:
    • $0.029/GB ingested.
    • Additional costs for transformations (Lambda) or format conversion.
  • Kinesis Data Analytics:
    • $0.11/KPU-hour (Kinesis Processing Unit, 4 vCPUs + 16 GB).
    • $0.013/GB for Flink storage.
  • Kinesis Video Streams:
    • $0.0085/GB ingested, $0.023/GB stored.
  • Other Costs:
    • S3: $0.023/GB/month.
    • Lambda: $0.20/1M requests.
  • Example:
    • Data Streams: 10 shards, 1M PUT units/day, 7-day retention, 1 TB stored.
    • Firehose: 1 TB ingested.
    • Analytics: 10 KPUs, 1 hour/day.
    • S3: 1 TB storage.
      • Streams: (10 × $0.015 × 720) + (1M × 30 × $0.014/1M) + (1,000 × $0.02 × 7/30) = $108 + $0.42 + $4.67 = ~$113.09.
      • Firehose: 1,000 GB × $0.029 = $29.
      • Analytics: 10 × $0.11 × 30 = $33.
      • S3: 1,000 GB × $0.023 = $23.
      • Total: $113.09 + $29 + $33 + $23 = ~$198.09/month.
  • Free Tier: 1M PUT units, 100 MB ingested (Firehose), 25 GB stored (Video Streams).

Cost Strategies

  • Optimize Shards:
    • Merge/split shards based on throughput needs.
    • Explanation: E.g., reduce from 10 to 5 shards, saving $54/month.
  • Use Firehose Buffering:
    • Increase buffer size/time for fewer S3 writes.
    • Explanation: E.g., buffer 128 MB vs. 1 MB, saving $10/month.
  • Minimize Analytics KPUs:
    • Optimize Flink/SQL code for fewer KPUs.
    • Explanation: E.g., reduce from 10 to 5 KPUs, saving $16.50/month.
  • Compress Data:
    • Compress records (e.g., GZIP) to reduce PUT units and ingestion costs.
    • Explanation: E.g., compress 1 TB to 200 GB, saving $5.80 (Firehose).
  • Use Standard Consumers:
    • Avoid enhanced fan-out unless low latency is critical.
    • Explanation: E.g., skip fan-out, saving $93.60/month (10 consumers).
  • Tagging:
    • Tag streams, Firehose, and S3 buckets for cost tracking.
    • Explanation: E.g., tag stream with “Project:Analytics”.
  • Monitor Usage:
    • Use CloudWatch and Cost Explorer to optimize shards/KPUs.
    • Explanation: E.g., reduce shards to save $50/month.

Key Notes:

  • Cost Savings: Optimize shards + buffering + compression + tagging = lower costs.
  • Exam Tip: Calculate shard and Firehose costs.

6. Kinesis Advanced Features

Enhanced Data Streams:

  • Purpose: Higher throughput.
  • Features:
    • Improved shard scaling and fan-out (2024).
    • Explanation: E.g., handle 100 MB/s with 50 shards.
  • Exam Tip: Know for high-performance streaming.
  • Purpose: Advanced analytics.
  • Features:
    • Optimized Flink performance (2024).
    • Supports complex event processing.
    • Explanation: E.g., detect anomalies in IoT data.
  • Exam Tip: Use for real-time analytics.

Security Hub Integration:

  • Purpose: Compliance monitoring.
  • Features:
    • Detects misconfigured streams (2025).
    • Explanation: E.g., flag unencrypted stream.
  • Exam Tip: Use for compliance.

Cross-Account Sharing:

  • Purpose: Multi-account streaming.
  • Features:
    • Resource policies for cross-account access.
    • Explanation: E.g., share stream with partner account.
  • Exam Tip: Know for enterprise setups.

Kinesis Video Streams with Rekognition:

  • Purpose: Video analytics.
  • Features:
    • Integrates with Rekognition for object detection.
    • Explanation: E.g., detect faces in surveillance video.
  • Exam Tip: Use for video use cases.

Key Notes:

  • Flexibility: Enhanced Streams + Flink + Rekognition = advanced streaming.
  • Exam Tip: Master Flink and cross-account sharing.

7. Kinesis Use Cases

Understand practical applications.

Real-Time Analytics

  • Setup: Data Streams → Analytics → S3.
  • Features: SQL/Flink for aggregations.
  • Explanation: E.g., compute clickstream metrics.

Log Processing

  • Setup: Firehose → S3 with Lambda transformation.
  • Features: Buffer and transform logs.
  • Explanation: E.g., load CloudTrail logs to S3.

IoT Data Processing

  • Setup: Data Streams → Lambda.
  • Features: Ingest sensor data, trigger alerts.
  • Explanation: E.g., process temperature sensor data.

Video Surveillance

  • Setup: Video Streams → Rekognition.
  • Features: Analyze live video feeds.
  • Explanation: E.g., detect motion in security footage.

8. Kinesis vs. Other Streaming Services

Feature Kinesis SNS/SQS MSK (Kafka)
Type Real-Time Streaming Messaging Managed Kafka
Focus Streaming data Pub/sub, queues Kafka ecosystem
Latency Milliseconds Seconds Milliseconds
Cost $0.015/shard-hour $0.40–$0.50/1M requests $0.21–$1.08/hour
Use Case Clickstream analytics Event notifications Kafka-based streaming

Explanation:

  • Kinesis: Real-time streaming and analytics.
  • SNS/SQS: Messaging and queuing.
  • MSK: Kafka-compatible streaming.

9. Detailed Explanations for Mastery

  • Enhanced Data Streams:
    • Example: Scale to 100 MB/s with fan-out.
    • Why It Matters: High-throughput streaming (2024).
  • Flink Optimizations:
    • Example: Faster anomaly detection.
    • Why It Matters: Advanced analytics (2024).
  • Security Hub:
    • Example: Flag unencrypted stream.
    • Why It Matters: Compliance (2025).

10. Quick Reference Table

Feature Purpose Key Detail Exam Relevance
Data Streams Ingest streaming data Shards, retention up to 365 days Core Concept
Firehose Load to data stores Buffering, Lambda transforms Core Concept
Data Analytics Analyze streams SQL, Flink (2024) Core Concept
Shards Scale throughput 1 MB/s write, 2 MB/s read Performance
Security Hub Compliance monitoring Misconfigured streams (2025) Security, Resilience
Enhanced Fan-Out Low-latency consumers Dedicated throughput (2024) Performance
KMS Encryption Secure data Server-side encryption Security