|
| 1 | +# CloudWatch Cheatsheet |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +Amazon CloudWatch is a comprehensive monitoring and management service designed for AWS and hybrid cloud applications. This guide covers everything from basic concepts to advanced configurations, helping you leverage CloudWatch for performance monitoring, troubleshooting, and operational insights. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## **1. Introduction to CloudWatch** |
| 10 | + |
| 11 | +### What is CloudWatch? |
| 12 | + |
| 13 | +- Amazon CloudWatch is a monitoring and observability service for AWS resources and custom applications. |
| 14 | +- Provides actionable insights through metrics, logs, alarms, and dashboards. |
| 15 | +- Supports both infrastructure and application-level monitoring. |
| 16 | + |
| 17 | +### Key Features: |
| 18 | + |
| 19 | +- **Metrics**: Collect and monitor key performance data. |
| 20 | +- **Logs**: Aggregate, analyze, and search logs. |
| 21 | +- **Alarms**: Set thresholds for metrics to trigger automated actions. |
| 22 | +- **Dashboards**: Visualize data in real time. |
| 23 | +- **CloudWatch Events**: Trigger actions based on changes in AWS resources. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## **2. CloudWatch Architecture Overview** |
| 28 | + |
| 29 | +- **Data Sources**: |
| 30 | + - AWS Services: EC2, RDS, Lambda, etc. |
| 31 | + - On-premises servers or hybrid setups using CloudWatch Agent. |
| 32 | +- **Core Components**: |
| 33 | + - **Metrics**: Quantifiable data points (e.g., CPU utilization). |
| 34 | + - **Logs**: Application and system logs. |
| 35 | + - **Alarms**: Notifications or automated responses. |
| 36 | + - **Dashboards**: Custom visualizations. |
| 37 | + - **Insights**: Advanced log analytics. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## **3. Setting Up CloudWatch** |
| 42 | + |
| 43 | +### Accessing CloudWatch |
| 44 | + |
| 45 | +1. Go to the **AWS Management Console**. |
| 46 | +2. Navigate to **CloudWatch** under the **Management & Governance** section. |
| 47 | + |
| 48 | +### CloudWatch Agent Installation |
| 49 | + |
| 50 | +To monitor custom metrics or on-premises resources: |
| 51 | + |
| 52 | +1. Install the CloudWatch Agent on your instance: |
| 53 | + |
| 54 | + ```bash |
| 55 | + sudo yum install amazon-cloudwatch-agent |
| 56 | + ``` |
| 57 | + |
| 58 | +2. Configure the agent: |
| 59 | + |
| 60 | + ```bash |
| 61 | + sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard |
| 62 | + ``` |
| 63 | + |
| 64 | +3. Start the agent: |
| 65 | + |
| 66 | + ```bash |
| 67 | + sudo /opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent |
| 68 | + ``` |
| 69 | + |
| 70 | +### Setting IAM Permissions |
| 71 | + |
| 72 | +Attach the **CloudWatchFullAccess** policy to the IAM role or user managing CloudWatch. |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +## **4. Metrics Monitoring** |
| 77 | + |
| 78 | +### Viewing Metrics |
| 79 | + |
| 80 | +1. In the CloudWatch console, go to **Metrics**. |
| 81 | +2. Select a namespace (e.g., `AWS/EC2`, `AWS/Lambda`). |
| 82 | +3. Choose metrics like `CPUUtilization`, `DiskWriteOps`, etc. |
| 83 | + |
| 84 | +### Common Metrics: |
| 85 | + |
| 86 | +- **EC2**: |
| 87 | + - `CPUUtilization` |
| 88 | + - `DiskReadBytes` |
| 89 | + - `NetworkIn/Out` |
| 90 | +- **RDS**: |
| 91 | + - `DatabaseConnections` |
| 92 | + - `ReadIOPS` |
| 93 | + - `WriteLatency` |
| 94 | +- **Lambda**: |
| 95 | + - `Invocations` |
| 96 | + - `Duration` |
| 97 | + - `Errors` |
| 98 | + |
| 99 | +### Custom Metrics |
| 100 | + |
| 101 | +To send custom metrics: |
| 102 | + |
| 103 | +1. Install the AWS CLI. |
| 104 | +2. Publish a metric: |
| 105 | + |
| 106 | + ```bash |
| 107 | + aws cloudwatch put-metric-data --namespace "CustomNamespace" --metric-name "MetricName" --value 100 |
| 108 | + ``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## **5. CloudWatch Logs** |
| 113 | + |
| 114 | +### Setting Up Log Groups and Streams |
| 115 | + |
| 116 | +1. Navigate to **Logs** in the CloudWatch console. |
| 117 | +2. Create a **Log Group** (e.g., `/aws/lambda/my-function`). |
| 118 | +3. Each application/service writes to a **Log Stream** under the group. |
| 119 | + |
| 120 | +### Exporting Logs to S3 |
| 121 | + |
| 122 | +1. Go to **Logs** → Select a log group. |
| 123 | +2. Click **Actions** → **Export data to Amazon S3**. |
| 124 | +3. Configure the export with the desired time range. |
| 125 | + |
| 126 | +### Querying Logs with CloudWatch Logs Insights |
| 127 | + |
| 128 | +1. Navigate to **Logs Insights**. |
| 129 | +2. Write queries for analysis: |
| 130 | + |
| 131 | + ```sql |
| 132 | + fields @timestamp, @message |
| 133 | + | filter @message like /ERROR/ |
| 134 | + | sort @timestamp desc |
| 135 | + | limit 20 |
| 136 | + ``` |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## **6. CloudWatch Alarms** |
| 141 | + |
| 142 | +### Creating an Alarm |
| 143 | + |
| 144 | +1. Go to **Alarms** in the CloudWatch console. |
| 145 | +2. Click **Create Alarm**. |
| 146 | +3. Select a metric (e.g., `CPUUtilization`). |
| 147 | +4. Set a threshold (e.g., `> 80%` for 5 minutes). |
| 148 | +5. Choose an action (e.g., send an SNS notification). |
| 149 | + |
| 150 | +### Alarm States: |
| 151 | + |
| 152 | +- **OK**: Metric is within the defined threshold. |
| 153 | +- **ALARM**: Metric breaches the threshold. |
| 154 | +- **INSUFFICIENT DATA**: No data available. |
| 155 | + |
| 156 | +### Advanced Alarm Configurations |
| 157 | + |
| 158 | +- Composite Alarms: Combine multiple alarms. |
| 159 | +- Actions: |
| 160 | + - Notify via SNS. |
| 161 | + - Trigger Lambda functions. |
| 162 | + - Stop/start EC2 instances. |
| 163 | + |
| 164 | +--- |
| 165 | + |
| 166 | +## **7. CloudWatch Dashboards** |
| 167 | + |
| 168 | +### Creating a Dashboard |
| 169 | + |
| 170 | +1. Go to **Dashboards** in the CloudWatch console. |
| 171 | +2. Click **Create Dashboard**. |
| 172 | +3. Add widgets: |
| 173 | + - **Line** for metrics. |
| 174 | + - **Number** for single values. |
| 175 | + - **Text** for notes. |
| 176 | + |
| 177 | +### Customizing Widgets |
| 178 | + |
| 179 | +- Choose metrics from different namespaces. |
| 180 | +- Configure time ranges and granularity. |
| 181 | + |
| 182 | +### Example: Multi-Service Dashboard |
| 183 | + |
| 184 | +- **EC2 Metrics**: CPU, Disk, Network. |
| 185 | +- **RDS Metrics**: Connections, IOPS. |
| 186 | +- **Lambda Metrics**: Invocations, Errors. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## **8. CloudWatch Events (EventBridge)** |
| 191 | + |
| 192 | +### Creating Rules |
| 193 | + |
| 194 | +1. Navigate to **Rules** under **Events** in the CloudWatch console. |
| 195 | +2. Create a rule with an event pattern (e.g., EC2 state change). |
| 196 | +3. Add a target (e.g., SNS, Lambda, Step Functions). |
| 197 | + |
| 198 | +### Example: Automate Instance Shutdown |
| 199 | + |
| 200 | +1. Event Pattern: |
| 201 | + |
| 202 | + ```json |
| 203 | + { |
| 204 | + "source": ["aws.ec2"], |
| 205 | + "detail-type": ["EC2 Instance State-change Notification"], |
| 206 | + "detail": { |
| 207 | + "state": ["stopped"] |
| 208 | + } |
| 209 | + } |
| 210 | + ``` |
| 211 | + |
| 212 | +2. Target: Send an SNS notification. |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +## **9. Advanced Configurations** |
| 217 | + |
| 218 | +### Cross-Account Monitoring |
| 219 | + |
| 220 | +1. Create a cross-account role with permissions to access CloudWatch in the target account. |
| 221 | +2. Use the `CloudWatch:ListMetrics` and `CloudWatch:GetMetricData` APIs. |
| 222 | + |
| 223 | +### Anomaly Detection |
| 224 | + |
| 225 | +Enable anomaly detection for metrics: |
| 226 | + |
| 227 | +1. Go to **Metrics** → Select a metric. |
| 228 | +2. Click **Actions** → **Enable anomaly detection**. |
| 229 | + |
| 230 | +### Metric Math |
| 231 | + |
| 232 | +Perform calculations across metrics: |
| 233 | + |
| 234 | +- Example: Combine CPU utilization across instances. |
| 235 | + |
| 236 | + ```bash |
| 237 | + (m1+m2)/2 |
| 238 | + ``` |
| 239 | + |
| 240 | +--- |
| 241 | + |
| 242 | +## **10. Integration with Other Services** |
| 243 | + |
| 244 | +### AWS Lambda |
| 245 | + |
| 246 | +- Use `console.log()` to write logs to CloudWatch. |
| 247 | +- Monitor Lambda-specific metrics like `Errors` and `Throttles`. |
| 248 | + |
| 249 | +### ECS/EKS |
| 250 | + |
| 251 | +- Enable CloudWatch Container Insights for detailed monitoring. |
| 252 | +- Use `awslogs` driver to send container logs to CloudWatch. |
| 253 | + |
| 254 | +### Integration with Third-Party Tools |
| 255 | + |
| 256 | +- Use **DataDog** or **Grafana** for enhanced visualization. |
| 257 | +- Integrate CloudWatch metrics into these platforms using APIs. |
| 258 | + |
| 259 | +--- |
| 260 | + |
| 261 | +## **11. Security Best Practices** |
| 262 | + |
| 263 | +### Log Retention |
| 264 | + |
| 265 | +- Set retention policies for logs to reduce costs: |
| 266 | + |
| 267 | + ```bash |
| 268 | + aws logs put-retention-policy --log-group-name "/aws/lambda/my-function" --retention-in-days 30 |
| 269 | + ``` |
| 270 | + |
| 271 | +### Fine-Grained Access Control |
| 272 | + |
| 273 | +- Use IAM policies to restrict access to specific metrics, logs, or dashboards. |
| 274 | + |
| 275 | +--- |
| 276 | + |
| 277 | +## **12. CloudWatch Pricing** |
| 278 | + |
| 279 | +### Pricing Model |
| 280 | + |
| 281 | +1. **Metrics**: Charged per metric, per month. |
| 282 | +2. **Logs**: |
| 283 | + - Ingestion: Cost per GB ingested. |
| 284 | + - Storage: Cost per GB stored. |
| 285 | +3. **Dashboards**: Charged per dashboard, per month. |
| 286 | + |
| 287 | +### Cost Optimization Tips |
| 288 | + |
| 289 | +- Use metric filters to limit data collection. |
| 290 | +- Set shorter retention periods for logs. |
| 291 | + |
| 292 | +--- |
| 293 | + |
| 294 | +## **13. Best Practices** |
| 295 | + |
| 296 | +1. **Organize Log Groups**: |
| 297 | + - Use consistent naming conventions (e.g., `/application/environment/service`). |
| 298 | + |
| 299 | +2. **Use Alarms Wisely**: |
| 300 | + - Avoid too many alarms to prevent alert fatigue. |
| 301 | + - Use composite alarms to group related metrics. |
| 302 | + |
| 303 | +3. **Automate Monitoring**: |
| 304 | + - Automate alert creation and dashboards using CloudFormation or Terraform. |
| 305 | + |
| 306 | +4. **Optimize Log Storage**: |
| 307 | + - Export logs to S3 for long-term storage and analysis. |
| 308 | + |
| 309 | +5. **Enable Anomaly Detection**: |
| 310 | + - Automate anomaly detection for critical metrics. |
| 311 | + |
| 312 | +--- |
| 313 | + |
| 314 | +## **14. References and Resources** |
| 315 | + |
| 316 | +- [CloudWatch Documentation](https://docs.aws.amazon.com/cloudwatch/) |
| 317 | +- [Metric Math Syntax Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html) |
| 318 | +- [CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/) |
0 commit comments