CloudWatch Monitoring
Collect metrics, logs, and alarms with CloudWatch to observe your AWS workloads.
Theory
CloudWatch is AWS’s monitoring hub. It collects:
- Metrics — numeric time-series (EC2 CPU, Lambda invocations); custom metrics too
- Logs — application/system logs sent to log groups, queried with Logs Insights
- Alarms — fire when a metric breaches a threshold, triggering SNS, Auto Scaling, or automation
- Events / EventBridge — react to state changes (rules → targets)
Related: CloudTrail records API activity (who did what — auditing), while X-Ray traces requests across services. Don’t confuse CloudWatch (metrics/logs) with CloudTrail (audit log).
Real-World Example
# Alarm: notify when EC2 CPU > 80% for 5 minutes
aws cloudwatch put-metric-alarm \
--alarm-name high-cpu --namespace AWS/EC2 \
--metric-name CPUUtilization --statistic Average \
--period 300 --threshold 80 --comparison-operator GreaterThanThreshold \
--evaluation-periods 1 --alarm-actions arn:aws:sns:...:ops-alertsLogs Insights query:
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc | limit 20 Hands-On Exercise
- Define an alarm (in words) for a Lambda error-rate spike.
- Explain the difference between CloudWatch and CloudTrail.
- Write a Logs Insights query idea to count errors per hour.
- Describe how an alarm can trigger Auto Scaling.
Cheat Sheet▾
| Service | Purpose |
|---|---|
| CloudWatch Metrics | Numeric time-series |
| CloudWatch Logs | Log storage + Insights queries |
| CloudWatch Alarms | Threshold-based alerts/actions |
| EventBridge | React to events |
| CloudTrail | API activity audit log |
| X-Ray | Distributed tracing |
Common Interview Questions▾
What's the difference between CloudWatch and CloudTrail?
CloudWatch monitors performance via metrics, logs, and alarms. CloudTrail records API calls for auditing and security — who did what, when. They serve operations vs governance.
What can a CloudWatch alarm do when it triggers?
Send an SNS notification, trigger an Auto Scaling action, run a Systems Manager automation, or invoke other targets — to alert humans or self-heal automatically.
Official Documentation
📝 My notes on this topic
Auto-saves as you type