Skip to content

Amazon CloudWatch

Amazon CloudWatch Overview

  • Definition: Amazon CloudWatch is a managed monitoring and observability service that collects, analyzes, and visualizes metrics, logs, and events from AWS resources, applications, and on-premises systems.
  • Key Features:
    • Collects metrics (e.g., EC2 CPU usage), logs (e.g., Lambda logs), and events (e.g., state changes).
    • Provides alarms for automated actions, dashboards for visualization, and Logs Insights for log analysis.
    • Supports CloudWatch Agent for custom metrics/logs and Synthetics for application health checks.
    • Integrates with SNS, Lambda, EventBridge, Auto Scaling, and Security Hub for automation and monitoring.
  • Use Cases: Monitor application performance, trigger auto-scaling, analyze logs for errors, ensure compliance, detect anomalies.
  • Key Updates (2024–2025):
    • Cross-Account Observability: Unified metrics/logs across accounts (October 2024).
    • Improved Anomaly Detection: Enhanced ML-based alerts (March 2024).
    • FIPS 140-2 Compliance: Enhanced for GovCloud (October 2024).
    • Security Hub Integration: Centralized monitoring findings (January 2025).

1. CloudWatch Core Concepts

Components

  • Metrics:
    • Time-series data points representing resource or application performance (e.g., EC2 CPUUtilization).
    • Stored for 15 months; customizable via namespaces.
    • Explanation: E.g., metric for S3 NumberOfObjects.
  • Logs:
    • Text-based records from applications or AWS services (e.g., Lambda execution logs).
    • Stored in Log Groups and Log Streams.
    • Explanation: E.g., log group /aws/lambda/myFunction.
  • Alarms:
    • Monitor metrics and trigger actions (e.g., SNS notification, Auto Scaling) based on thresholds.
    • Explanation: E.g., alarm if EC2 CPUUtilization > 80%.
  • Dashboards:
    • Customizable visualizations of metrics, logs, and alarms.
    • Explanation: E.g., dashboard for EC2, RDS, Lambda metrics.
  • Events:
    • Records of state changes or schedules (e.g., EC2 instance termination, cron jobs).
    • Processed via EventBridge (formerly CloudWatch Events).
    • Explanation: E.g., event triggers Lambda on S3 upload.
  • Logs Insights:
    • Interactive query tool for analyzing log data using SQL-like syntax.
    • Explanation: E.g., query Lambda logs for “ERROR”.
  • CloudWatch Agent:
    • Software to collect custom metrics/logs from EC2, on-premises servers.
    • Explanation: E.g., collect memory usage from EC2.
  • Synthetics:
    • Canary scripts to monitor application endpoints and APIs.
    • Explanation: E.g., synthetic test for website uptime.
  • Contributor Insights:
    • Analyzes log data to identify top contributors (e.g., IP addresses, users).
    • Explanation: E.g., identify top API callers.
  • CloudWatch Evidently:
    • Feature flagging and experimentation for application changes.
    • Explanation: E.g., test new UI feature on 10% of users.

Key Concepts

  • Namespaces:
    • Containers for metrics (e.g., AWS/EC2, Custom/MyApp).
    • Explanation: E.g., MyApp/MemoryUsage for custom metric.
  • Dimensions:
    • Key-value pairs to filter metrics (e.g., InstanceId=i-123).
    • Explanation: E.g., metric for specific EC2 instance.
  • Retention:
    • Metrics: 15 months (1-minute granularity for 3 hours, 5-minute for 14 days, etc.).
    • Logs: Configurable (1 day to indefinite).
    • Explanation: E.g., retain Lambda logs for 30 days.
  • Anomaly Detection:
    • ML-based detection of metric anomalies.
    • Explanation: E.g., detect unusual spike in API latency.
  • Cross-Account Observability:
    • Unified view of metrics/logs across accounts (new 2024).
    • Explanation: E.g., monitor 10 accounts in one dashboard.

Key Notes:

  • Exam Relevance: Understand metrics, logs, alarms, Logs Insights, and cross-account observability.
  • Mastery Tip: Compare CloudWatch vs. CloudTrail vs. X-Ray for monitoring.

2. CloudWatch Performance Features

CloudWatch optimizes monitoring and analysis.

Low Latency

  • Purpose: Real-time monitoring.
  • Features:
    • Metrics updated in seconds (e.g., EC2 CPUUtilization every 5 minutes, optional 1 minute).
    • Logs delivered to Log Groups in near-real-time.
    • Logs Insights queries execute in seconds (improved 2024).
  • Explanation: E.g., Lambda logs appear in <1 second.
  • Exam Tip: Highlight real-time metrics/logs for performance.

High Throughput

  • Purpose: Handle large data volumes.
  • Features:
    • Scales to billions of metrics/logs daily.
    • Supports high-frequency custom metrics via CloudWatch Agent.
  • Explanation: E.g., collect 1 million Lambda logs/hour.
  • Exam Tip: Use for high-traffic apps.

Scalability

  • Purpose: Support growing workloads.
  • Features:
    • Auto-scales for metrics, logs, and alarms.
    • Cross-account observability for multi-account environments (new 2024).
  • Explanation: E.g., monitor 1,000 EC2 instances across 10 accounts.
  • Exam Tip: Use cross-account for enterprise scalability.

Key Notes:

  • Performance: Low latency + high throughput + scalability = efficient monitoring.
  • Exam Tip: Emphasize CloudWatch for real-time, scalable observability.

3. CloudWatch Resilience Features

Resilience ensures reliable monitoring.

Multi-AZ/Region Redundancy

  • Purpose: Survive failures.
  • Features:
    • Metrics/logs stored in Regional, multi-AZ infrastructure.
    • Cross-Region data replication for logs (manual setup).
  • Explanation: E.g., metrics persist if us-east-1a fails.
  • Exam Tip: Highlight multi-AZ for HA.

Continuous Monitoring:

  • Purpose: Uninterrupted observability.
  • Features:
    • Runs 24/7, unaffected by resource failures.
    • Alarms and Synthetics monitor continuously.
  • Explanation: E.g., monitor EC2 during S3 outage.
  • Exam Tip: Use for continuous monitoring.

Monitoring and Recovery:

  • Purpose: Detect and respond to issues.
  • Features:
    • Alarms trigger SNS, Lambda, or Auto Scaling for recovery.
    • CloudWatch Events (EventBridge) for automated workflows.
    • Security Hub detects misconfigured monitoring (new 2025).
    • Explanation: E.g., alarm scales out EC2 on high CPU.
  • Exam Tip: Use alarms and EventBridge for resilience.

Data Retention:

  • Purpose: Ensure historical data availability.
  • Features:
    • Metrics retained for 15 months.
    • Logs configurable (1 day to indefinite).
  • Explanation: E.g., analyze 6-month-old EC2 metrics.
  • Exam Tip: Highlight retention for compliance.

Key Notes:

  • Resilience: Multi-AZ + continuous monitoring + alarms + retention = reliable observability.
  • Exam Tip: Design resilient monitoring with alarms and cross-Region replication.

4. CloudWatch Security Features

Security is a core focus for CloudWatch in SAA-C03.

Access Control

  • IAM Policies:
    • Control access to metrics, logs, alarms (cloudwatch:PutMetricData, logs:CreateLogGroup).
    • Restrict access to specific namespaces or Log Groups.
    • Example: {"Effect": "Allow", "Action": "cloudwatch:PutMetricData", "Resource": "*"}.
  • Resource Policies:
    • Control cross-account access to Log Groups.
    • Explanation: E.g., allow dev account to read prod logs.
  • Exam Tip: Practice IAM and resource policies for access control.

Encryption

  • In Transit:
    • HTTPS for API calls and log/metric delivery.
    • Explanation: E.g., secure PutMetricData call.
  • At Rest:
    • Log Groups encrypted with KMS (optional).
    • Metrics stored securely (no KMS option).
    • Explanation: E.g., KMS key encrypts Lambda logs.
  • Exam Tip: Highlight KMS for compliance.

Compliance:

  • Purpose: Meet regulatory standards.
  • Features:
    • Supports HIPAA, PCI, SOC, ISO, GDPR, FIPS 140-2 (GovCloud).
    • Security Hub detects misconfigured monitoring (new 2025).
  • Explanation: E.g., use CloudWatch for PCI-compliant monitoring.
  • Exam Tip: Highlight compliance certifications.

Auditing and Analysis:

  • Purpose: Track and investigate activity.
  • Features:
    • CloudTrail logs CloudWatch API calls (e.g., CreateAlarm).
    • Logs Insights for log analysis.
    • Contributor Insights for top contributors.
  • Explanation: E.g., query Lambda logs for “timeout”.
  • Exam Tip: Use Logs Insights for security investigations.

Key Notes:

  • Security: IAM + encryption + compliance + auditing = secure monitoring.
  • Exam Tip: Configure KMS, IAM, and Logs Insights for secure observability.

5. CloudWatch Cost Optimization

Cost efficiency is a key exam domain.

Pricing

  • Metrics:
    • $0.30/metric/month (first 10,000 free).
    • $0.01/1,000 metric requests.
  • Logs:
    • $0.50/GB ingested.
    • $0.03/GB stored/month.
    • $1.00/GB for Logs Insights queries.
  • Alarms:
    • $0.10/alarm/month (standard).
    • $0.30/alarm/month (high-resolution).
  • Dashboards:
    • $3/dashboard/month.
  • Synthetics:
    • $0.0012/canary run.
  • Example:
    • 100 metrics, 1M metric requests, 10 GB logs ingested, 10 GB stored, 5 alarms, 1 dashboard, 1,000 canary runs:
      • Metrics: 100 × $0.30 + 1M × $0.01/1,000 = $30 + $10 = $40.
      • Logs: 10 × $0.50 + 10 × $0.03 = $5 + $0.30 = $5.30.
      • Alarms: 5 × $0.10 = $0.50.
      • Dashboard: 1 × $3 = $3.
      • Synthetics: 1,000 × $0.0012 = $1.20.
      • Total: $40 + $5.30 + $0.50 + $3 + $1.20 = ~$50/month.
  • Free Tier: 10 metrics, 3 dashboards, 10 alarms, 5 GB logs/month.

Cost Strategies

  • Limit Metrics:
    • Use default AWS metrics; avoid excessive custom metrics.
    • Explanation: E.g., monitor CPUUtilization only to save $0.30/metric.
  • Optimize Logs:
    • Enable logs only for critical resources (e.g., Lambda errors).
    • Use lifecycle policies to archive/delete logs.
    • Explanation: E.g., archive logs to S3 ($0.023/GB) to save $0.007/GB.
  • Targeted Alarms:
    • Create alarms for critical thresholds only.
    • Explanation: E.g., alarm on CPU > 80% to save $0.10/alarm.
  • Efficient Queries:
    • Use Logs Insights for targeted queries vs. scanning all logs.
    • Explanation: E.g., save $1/GB with precise queries.
  • Tagging:
    • Tag Log Groups, alarms, dashboards for cost tracking.
    • Explanation: E.g., tag Log Group with “Project:App”.
  • Monitor Usage:
    • Use CloudWatch metrics to track ingestion and optimize.
    • Explanation: E.g., reduce log ingestion to save $5/month.

Key Notes:

  • Cost Savings: Limit metrics/logs + lifecycle policies + tagging = lower costs.
  • Exam Tip: Calculate costs for metrics, logs, and alarms.

6. CloudWatch Advanced Features

Cross-Account Observability:

  • Purpose: Unified monitoring.
  • Features:
    • Aggregates metrics/logs across accounts via AWS Organizations (new 2024).
    • Explanation: E.g., dashboard for 10 accounts’ EC2 metrics.
  • Exam Tip: Use for enterprise monitoring.

Improved Anomaly Detection:

  • Purpose: Detect unusual patterns.
  • Features:
    • Enhanced ML-based detection for metrics (new 2024).
    • Explanation: E.g., detect spike in Lambda errors.
  • Exam Tip: Enable for proactive monitoring.

Security Hub Integration:

  • Purpose: Centralized security monitoring.
  • Features:
    • Detects misconfigured CloudWatch setups (e.g., no alarms) (new 2025).
    • Aggregates findings with GuardDuty, Inspector.
  • Explanation: E.g., flag missing EC2 monitoring.
  • Exam Tip: Use Security Hub for compliance.

CloudWatch Synthetics:

  • Purpose: Monitor application health.
  • Features:
    • Canary scripts test endpoints, APIs, and workflows.
    • Explanation: E.g., test API endpoint every 5 minutes.
  • Exam Tip: Use for proactive health checks.

CloudWatch Evidently:

  • Purpose: Feature experimentation.
  • Features:
    • A/B testing and feature flagging for apps.
    • Explanation: E.g., roll out new feature to 20% of users.
  • Exam Tip: Know for application testing.

Contributor Insights:

  • Purpose: Identify log contributors.
  • Features:
    • Analyzes top users, IPs, or resources in logs.
    • Explanation: E.g., find top S3 bucket accessors.
  • Exam Tip: Use for log analysis.

Key Notes:

  • Flexibility: Cross-account + anomaly detection + Synthetics = advanced observability.
  • Exam Tip: Master cross-account observability, Synthetics, and Evidently.

7. CloudWatch Use Cases

Understand practical applications.

Application Monitoring

  • Setup: Metrics for EC2, Lambda; logs for errors; alarms for thresholds.
  • Features: Real-time performance tracking.
  • Explanation: E.g., monitor Lambda latency, alert on errors.

Auto-Scaling

  • Setup: Alarms trigger Auto Scaling policies.
  • Features: Scale EC2 based on CPU or custom metrics.
  • Explanation: E.g., scale out if CPUUtilization > 70%.

Log Analysis

  • Setup: Log Group for app logs, Logs Insights for queries.
  • Features: Identify errors, performance issues.
  • Explanation: E.g., query API Gateway logs for 500 errors.

Compliance Monitoring

  • Setup: Metrics/logs with KMS, Security Hub integration.
  • Features: HIPAA/PCI-compliant monitoring.
  • Explanation: E.g., monitor RDS for PCI compliance.

8. CloudWatch vs. Other Monitoring Services

Feature CloudWatch CloudTrail X-Ray
Type Monitoring/Observability API Audit Application Tracing
Focus Metrics, logs, alarms API calls, events Request tracing
Data Performance, logs Management/data events Traces, latency
Cost $0.30/metric, $0.50/GB Free–$2/100K events $5/1M traces
Use Case Monitor EC2 CPU Audit IAM changes Trace Lambda requests

Explanation:

  • CloudWatch: General monitoring and logging.
  • CloudTrail: API auditing.
  • X-Ray: Application performance tracing.

9. Detailed Explanations for Mastery

  • Cross-Account Observability:
    • Example: Dashboard for EC2 metrics across 10 accounts.
    • Why It Matters: Unified enterprise monitoring—new for 2024.
  • Improved Anomaly Detection:
    • Example: Detect unusual spike in S3 requests.
    • Why It Matters: Proactive alerts—new for 2024.
  • Security Hub Integration:
    • Example: Flag missing Lambda monitoring.
    • Why It Matters: Centralized compliance—new for 2025.

10. Quick Reference Table

Feature Purpose Key Detail Exam Relevance
Metrics Track performance Time-series, 15-month retention Core Concept
Logs Store app logs Log Groups, Insights queries Core Concept
Alarms Trigger actions Threshold-based, SNS/Lambda Core Concept
Cross-Account Unified monitoring Multi-account metrics/logs (2024) Scalability
Anomaly Detection Detect unusual patterns ML-based, improved (2024) Security, Performance
Synthetics Monitor endpoints Canary scripts for health checks Flexibility
Security Hub Compliance monitoring Misconfigured setups (2025) Security, Resilience