Amazon DocumentDB
Amazon DocumentDB Overview
- Definition: Amazon DocumentDB is a fully managed, MongoDB-compatible NoSQL database service designed for scalability, performance, and high availability, optimized for JSON-like document storage.
- Key Features:
- Compatible with MongoDB 3.6, 4.0, and 5.0 APIs.
- Scales compute and storage independently.
- Managed backups, replication, and patching.
- Use Cases: Content management, user profiles, real-time analytics, e-commerce catalogs.
1. DocumentDB Core Concepts
Architecture
- Cluster: One primary instance (read/write) + up to 15 read replicas.
- Explanation: Similar to Aurora—decouples compute from storage for scalability.
- Storage:
- Auto-scales from 10 GB to 64 TB.
- Replicates 6 copies across 3 Availability Zones (AZs).
- Explanation: No manual provisioning—grows with data; 11 9s durability.
- Documents: JSON-like (BSON), up to 16 MB per document.
- Explanation: Flexible schema—e.g., { "userId": "123", "name": "Alice", "orders": [...] }.
Instances
- Types:
- General Purpose (e.g., db.t3, db.m5): Burstable (t3), balanced (m5).
- Memory Optimized (e.g., db.r5): High memory for large datasets.
- Explanation: t3.medium is Free Tier-eligible; r5 for high-throughput apps.
- Scaling:
- Vertical: Change instance type (brief downtime).
- Horizontal: Add read replicas.
- Explanation: Replicas scale reads; primary handles writes.
MongoDB Compatibility
- Purpose: Run MongoDB workloads with minimal changes.
- Features:
- Supports MongoDB drivers, tools (e.g., mongodump).
- Not all MongoDB features (e.g., no sharding, limited aggregation pipeline).
- Explanation: Drop-in for most apps—check compatibility for advanced features.
Key Notes:
- Exam Relevance: Understand cluster model and MongoDB compatibility.
- Mastery Tip: Compare DocumentDB vs. MongoDB self-managed (e.g., on EC2).
2. DocumentDB Performance Features
DocumentDB supports high-performing NoSQL workloads.
Low Latency
- Purpose: Fast document reads/writes.
- Features: Millisecond latency, SSD-based storage.
- Explanation: Optimized for JSON queries—e.g., find user by userId.
Read Replicas
- Purpose: Scale read traffic.
- Features:
- Up to 15 replicas, sub-second replication lag.
- Auto-scaling replicas based on load.
- Cross-region replicas for global reads.
- Explanation: Offloads queries—e.g., analytics on replica, writes on primary.
Indexing
- Purpose: Speed up queries.
- Types:
- Primary index (unique key).
- Secondary indexes (single field, compound, multikey).
- Explanation: E.g., index on email for fast lookups.
- Limit: 64 indexes per collection.
- Explanation: Balance indexes—too many slow writes.
Global Clusters
- Purpose: Low-latency reads across regions.
- How It Works: Primary Region (read/write), secondary Regions (read-only, < 1s lag).
- Explanation: Similar to Aurora Global Database—e.g., U.S. writes, EU reads.
Key Notes:
- Performance: Replicas + indexes = scalable queries.
- Exam Tip: Know read replica setup for read-heavy apps.
3. DocumentDB Resilience Features
Resilience ensures high availability and data recovery.
Multi-AZ Storage
- Purpose: Survive AZ failures.
- How It Works: 6 copies across 3 AZs, tolerates 2 copy losses.
- Explanation: Built-in—no Multi-AZ config needed (unlike RDS).
Failover
- Purpose: High availability.
- How It Works:
- Primary fails, replica promoted in ~30-120 seconds.
- Automatic DNS switch.
- Explanation: Fast due to shared storage—similar to Aurora.
Backups
- Automated:
- Continuous, restore to any point in last 35 days.
- Stored in S3, incremental.
- Explanation: PITR for granular recovery—e.g., undo bad update.
- Manual Snapshots:
- User-triggered, persist until deleted.
- Explanation: Use for long-term backups, cross-region copy.
Global Clusters
- Purpose: Regional DR.
- Explanation: Promote secondary Region to primary if needed—manual process.
Key Notes:
- Resilience: 6-copy storage + fast failover = robust HA.
- Exam Tip: Compare failover time (~30-120s) vs. RDS (~1-2 min).
4. DocumentDB Security Features
Security aligns with SAA-C03’s secure architecture focus.
Encryption
- At Rest: AWS KMS (default, cannot disable).
- In Transit: TLS for client connections (MongoDB drivers).
- Explanation: Meets compliance—e.g., GDPR, HIPAA.
Access Control
- Authentication:
- Native MongoDB users (username/password).
- IAM roles (MongoDB 4.0+).
- Explanation: IAM simplifies app auth—e.g., Lambda to DocumentDB.
- Authorization:
- Role-based access control (RBAC) for collections.
- Explanation: Fine-grained—e.g., read-only for analytics users.
- VPC:
- Deploy in private subnets, control via security groups.
- Example: Allow port 27017 from app subnet only.
- Explanation: No public access by default—secure by design.
Audit Logging
- Purpose: Track activity.
- Features: Profiler logs queries, saved to CloudWatch.
- Explanation: Monitor for security—e.g., detect unauthorized access.
Key Notes:
- Security: KMS + IAM + VPC = enterprise-grade protection.
- Exam Tip: Practice IAM role for DocumentDB access.
5. DocumentDB Cost Optimization
Cost efficiency is a key exam domain.
Instance Sizing
- Pricing: Pay for instance hours (e.g., db.t3.medium ~$0.07/hour, db.r5.large ~$0.29/hour).
- Cost Tips:
- Use db.t3.medium for dev/test (Free Tier-eligible).
- Reserved Instances (1- or 3-year) for production (~50% savings).
- Explanation: Right-size instances—e.g., t3 for low traffic.
Storage
- Pricing: ~$0.10/GB-month, pay for used capacity.
- Cost Tips: Monitor growth—cannot shrink storage.
- Explanation: Auto-scaling avoids over-provisioning.
Replicas
- Purpose: Scale reads without over-sizing primary.
- Cost Tips: Add replicas only for load or DR—each costs instance price.
- Explanation: One replica often enough for HA.
Backups
- Pricing: ~$0.02/GB-month for snapshots.
- Cost Tips: Set retention to minimum (e.g., 7 days), delete manual snapshots.
- Explanation: Snapshots add cost—manage carefully.
Key Notes:
- Cost Savings: t3 + minimal replicas + short retention.
- Exam Tip: Calculate cost for cluster with 1 replica vs. primary-only.
6. DocumentDB Advanced Features
Change Streams
- Purpose: Capture real-time updates.
- Features: Monitor insert/update/delete (MongoDB 4.0+).
- Explanation: Event-driven—e.g., trigger Lambda on new order.
Tunable Consistency
- Purpose: Balance performance vs. accuracy.
- Options: Eventual, strong consistency (per query).
- Explanation: Default eventual—use strong for critical reads (e.g., inventory).
Aggregation Pipeline
- Purpose: Transform/query documents.
- Features: Supports most MongoDB operators (e.g., $group, $match).
- Explanation: Complex analytics—e.g., sum orders by region.
Cluster Clones
- Purpose: Create instant copies.
- Features: Copy-on-write for testing/dev.
- Explanation: Fast, cost-efficient—e.g., test schema changes.
Key Notes:
- Flexibility: Change Streams + Aggregation = powerful NoSQL.
- Exam Tip: Know Change Streams for event-driven apps.
7. DocumentDB Use Cases
Understand practical applications.
Content Management
- Setup: DocumentDB + ECS.
- Features: Flexible schema for articles, metadata.
- Explanation: CMS—e.g., blog posts with nested comments.
User Profiles
- Setup: DocumentDB + Lambda.
- Features: Fast reads, JSON storage.
- Explanation: Social apps—e.g., user settings, preferences.
E-Commerce Catalogs
- Setup: DocumentDB + Global Clusters.
- Features: Scalable, global reads.
- Explanation: Product data—e.g., nested attributes (size, color).
Real-Time Analytics
- Setup: DocumentDB + Change Streams.
- Features: Event-driven updates.
- Explanation: Dashboards—e.g., live order tracking.
8. DocumentDB vs. Other Databases
Feature | DocumentDB | DynamoDB | Aurora |
---|---|---|---|
Type | NoSQL (Document) | NoSQL (Key-Value) | Relational |
Data Model | JSON/BSON | Key-value, JSON | Tables |
Workload | Document queries | Key lookups, simple queries | SQL transactions |
Storage | 64 TB | Unlimited | 128 TB |
Latency | Milliseconds | Milliseconds | Milliseconds |
Cost | Instance + storage | Request-based | Instance + storage |
Explanation:
- DocumentDB: Nested documents, MongoDB apps.
- DynamoDB: Simple key-value, infinite scale.
- Aurora: Structured data, SQL.
Detailed Explanations for Mastery
- Read Replicas:
- Example: Primary in us-east-1, replica in eu-west-1 for EU reads.
- Why It Matters: Scales reads, supports DR—exam tests replica use.
- Change Streams:
- Example: New order → Lambda → update dashboard.
- Why It Matters: Enables real-time—common SAA-C03 pattern.
- Indexes:
- Example: Index email for login queries—speeds up finds.
- Why It Matters: Over-indexing slows writes—balance is key.
Quick Reference Table
Feature | Purpose | Key Detail | Exam Relevance |
---|---|---|---|
Cluster Architecture | Scalability | Primary + 15 replicas | Core Concept |
Storage | Data persistence | Auto-scales to 64 TB | Cost, Performance |
Read Replicas | Read scaling | Sub-second lag, cross-region | Performance, Resilience |
Global Clusters | Global reads | < 1s replication | Resilience, Performance |
Failover | High availability | ~30-120 seconds | Resilience |
Encryption | Security | KMS, TLS | Security |
IAM Auth | Secure access | MongoDB 4.0+ | Security |
Change Streams | Real-time updates | Event-driven | Performance |