Others
1. AWS Data Exchange
- Definition: A service to find, subscribe to, and use third-party data in the AWS Cloud.
- Key Features:
- Access 3,500+ data products from 300+ providers.
- Integrates with S3, Redshift, Athena, and QuickSight for analytics.
- Supports data delivery without infrastructure management.
- Use Cases: Market research, financial analysis, IoT data enrichment.
- Performance: Scales to deliver data to millions of AWS users.
- Resilience: S3-based storage ensures 11 9s durability.
- Security: IAM for access control, KMS for encryption, no sharing of sensitive private data (e.g., medical records).
- Cost: Pay-per-use (subscription + data usage); no infrastructure costs.
- Integrations: S3, Redshift, Athena, QuickSight, Glue.
- Recent Updates: Enhanced data catalog integration (2024).
- Exam Tip: Use for third-party data integration; compare with AWS Glue for ETL.
2. AWS Lake Formation
- Definition: A service to build, manage, and secure data lakes on S3.
- Key Features:
- Creates data lakes by cataloging and moving data to S3.
- Fine-grained access control (row/column-level) via IAM and Lake Formation permissions.
- Integrates with Glue for ETL and Data Catalog.
- Use Cases: Centralized data lakes for analytics, ML, and reporting.
- Performance: Scales to petabytes; optimized for Athena, Redshift, EMR queries.
- Resilience: S3 durability (11 9s), multi-AZ metadata storage.
- Security: KMS encryption, Lake Formation permissions, Security Hub integration (2025).
- Cost: $1/100K catalog objects, $0.44/DPU-hour for Glue ETL; optimize with partitioning.
- Integrations: S3, Glue, Athena, Redshift, QuickSight, EMR, SageMaker.
- Recent Updates: Enhanced row/column access, cross-account sharing (Jan 2025).
- Exam Tip: Use for secure data lakes; master Lake Formation vs. IAM permissions.
3. Amazon Managed Streaming for Apache Kafka (Amazon MSK)
- Definition: A fully managed service for running Apache Kafka to process real-time streaming data.
- Key Features:
- Supports Kafka APIs for producers/consumers.
- Auto-manages brokers, ZooKeeper, and scaling.
- Integrates with Kinesis Firehose for S3 delivery.
- Use Cases: Streaming pipelines, log processing, IoT analytics.
- Performance: Low-latency (milliseconds), scales to GB/s with brokers.
- Resilience: Multi-AZ clusters, auto-recovery of failed brokers.
- Security: IAM, KMS encryption, SASL/SCRAM, Security Hub (2025).
- Cost: $0.21–$1.08/broker-hour, $0.02/GB-month storage; optimize brokers.
- Integrations: Kinesis Firehose, S3, OpenSearch, Lambda, Redshift.
- Recent Updates: Managed S3 delivery via Firehose (2023), throughput enhancements (2024).
- Exam Tip: Compare MSK (Kafka) vs. Kinesis for streaming.
4. AWS Data Pipeline
- Definition: A managed orchestration service for scheduling and automating data movement and transformation.
- Key Features:
- Schedules ETL workflows across AWS services (e.g., S3, RDS, Redshift).
- Supports on-premises data sources.
- Uses tasks runners (EC2, Lambda) for processing.
- Use Cases: Legacy ETL pipelines, periodic data transfers.
- Performance: Limited scalability compared to Glue; suitable for simple workflows.
- Resilience: Retries failed tasks, S3 durability for data.
- Security: IAM roles, KMS encryption for data.
- Cost: $1–$2.50/pipeline/month; optimize with scheduling.
- Integrations: S3, RDS, Redshift, DynamoDB, EMR.
- Recent Updates: No major updates; considered legacy vs. Glue.
- Exam Tip: Use for simple, scheduled ETL; prefer Glue for modern workloads.
5. Amazon OpenSearch Service
- Definition: A managed service for deploying and scaling OpenSearch clusters for search and analytics.
- Key Features:
- Supports OpenSearch (up to 2.17) and legacy Elasticsearch (up to 7.10).
- Ingests logs, metrics, and traces for real-time analytics.
- Zero-ETL integrations with S3, CloudWatch, Security Lake.
- Use Cases: Log analytics, application search, security analytics.
- Performance: Millisecond query latency, scales to petabytes with OR1 instances (2024).
- Resilience: Multi-AZ, auto-replacement of failed nodes, 11 9s S3 durability.
- Security: IAM, KMS, fine-grained access, Security Hub (2025).
- Cost: $0.03–$1.08/instance-hour, $0.024/GB-month storage; use serverless for auto-scaling.
- Integrations: S3, Kinesis, Lambda, QuickSight, CloudWatch, Security Lake.
- Recent Updates: Vector search on disk (2024), OpenSearch 2.17 support (2025).
- Exam Tip: Use for search/log analytics; compare with Athena for SQL queries.
6. Amazon QuickSight
- Definition: A serverless business intelligence (BI) service for creating interactive dashboards and visualizations.
- Key Features:
- Supports data from S3, Redshift, Athena, OpenSearch, and more.
- ML-powered insights (e.g., anomaly detection, forecasting).
- QuickSight Q for natural language queries (e.g., “Top sales in 2024”).
- Use Cases: Business reporting, real-time dashboards, data exploration.
- Performance: SPICE engine for fast queries; scales to thousands of users.
- Resilience: Serverless, multi-AZ, durable S3 storage for datasets.
- Security: IAM, KMS, row-level security, VPC connectivity.
- Cost: $9–$18/user/month (standard), $0.30/GB SPICE; optimize with SPICE caching.
- Integrations: S3, Redshift, Athena, OpenSearch, Lake Formation, Data Exchange.
- Recent Updates: Enhanced Q natural language capabilities (2024).
- Exam Tip: Use for BI; compare with OpenSearch for operational analytics.
Quick Comparison Table
Service | Purpose | Key Use Case | Cost Model | Key Integration |
---|---|---|---|---|
AWS Data Exchange | Third-party data access | Market research | Subscription + usage | S3, Redshift, Athena |
AWS Lake Formation | Secure data lake management | Centralized analytics | $1/100K objects, Glue ETL | Glue, Athena, QuickSight |
Amazon MSK | Managed Kafka streaming | Real-time pipelines | $0.21–$1.08/broker-hour | Firehose, S3, OpenSearch |
AWS Data Pipeline | ETL orchestration | Legacy data transfers | $1–$2.50/pipeline/month | S3, RDS, Redshift |
Amazon OpenSearch | Search and log analytics | Log monitoring | $0.03–$1.08/instance-hour | S3, Kinesis, QuickSight |
Amazon QuickSight | BI and visualizations | Dashboards | $9–$18/user/month | S3, Athena, OpenSearch |
Study Tips
- Hands-On:
- Data Exchange: Subscribe to a dataset, query with Athena.
- Lake Formation: Create a data lake, apply row-level permissions.
- MSK: Set up a Kafka cluster, stream to S3 via Firehose.
- Data Pipeline: Schedule an S3-to-Redshift pipeline.
- OpenSearch: Ingest logs from Kinesis, query with dashboards.
- QuickSight: Build a dashboard from S3 data via Athena.
- Scenarios:
- Third-party data = Data Exchange.
- Secure data lake = Lake Formation.
- Kafka streaming = MSK.
- Legacy ETL = Data Pipeline.
- Log analytics = OpenSearch.
- BI dashboards = QuickSight.
- Memorize: Pricing, integrations, Lake Formation permissions, OpenSearch vs. QuickSight use cases.