Skip to content
This repository was archived by the owner on Jun 23, 2021. It is now read-only.

Serverless Operations

Lu Hong edited this page Oct 10, 2019 · 35 revisions

Operations plays an important role in production service. In this project, we are trying to show how we follow the operational best practices including setting up alarms, dashboards, CI (Continuous Integration) and CD (Continuous Deployment).

Alarms

After following the Quick Start to deploy the project, the operations component will be deployed into your AWS account. To find the alarms that have been created:

  1. Login to your AWS account that you deployed the project to. If you have not deployed the project, check Quick Start.
  2. Go to CloudWatch console and click "Alarms".
  3. Search for "ops" and you will see four alarms: Api4xxErrors, ApiLatencyP90, ApiAvailability and ApiLatencyP50. The alarms monitor for the most critical operational problems: error rate, availability and latency.
  4. Click on one of the alarms and you will see the detail page for the alarm.

If the alarms go into "ALARM" state, messages will be sent to "AlarmsTopic" SNS topic. To receive notification from the topic, you can create subscription to the topic:

  1. Go to Amazon SNS console, click "Topics" and choose the "AlarmsTopic".
  2. On the detail page, click "Create subscription" and use the desired protocol and endpoint.

Dashboard

  1. Go to CloudWatch console and click "Dashboards".
  2. Click on the name "Dashboard-xxx" and you will see the dashboard

The dashboard is composed of three parts: API Gateway metrics, API Lambda metrics and CloudWatch Insights queries.

API Gateway metrics

API Gateway metrics include 5XX error count and availability, request count, 4XX error count and latency. It provides a view of api usage and health.

API Lambda metrics

API Lambda metrics include error count and success rate, invocations and latency. It shows usage and health of the lambda function.

CloudWatch Insights queries

CloudWatch Insights queries include "Top 10 customers by Request Count", "Top 10 Customers Impacted by API 5xx", "Top 10 API 5xx Errors", "Top 10 API 4xx Errors" and "Top 10 API 4xx Errors". They give an insight of api performance and how customers are impacted by performance issues.

CI (Continuous Integration)

CD (Continuous Deployment)

Clone this wiki locally