Terraform AWS Metrics lambda

This is a lambda python implementation heavily based on YACE.

But written in much slower less efficient Python so that it can be packaged as a terraform module and and delivering the output in json to an SQS queue.

YACE and this lambda can be configured to use the Resource Groups Tagging API to discover AWS resources, allowing for scraping metrics relating to dynamic or ephemeral resources.

NOTE: Be selective about what metrics you scrape, scraping lots of metrics can be expensive, you may want to compare with AWS Cloudwatch Metrics streams, (though that can also be costly as, if not used selectively, it may deliver lots of metrics you are do not need).

NOTE: Cloudwatch APIs are rate limited, do not deploy multiple metrics lambdas running on the same schedule, the implementation supports environment variables to limit the concurrency for the different APIs e.g. TAGGING_API_CONCURRENCY (by account/region)

NOTE: In the first iteration of this function we're passing the scrape config as a lambda environment variable, there is a total size limit of 4KB for lambda environment variables, if you need more config, the function will need to be updated to support config from s3 or another source.

variables

name	description	default
region	region in which the lambda will be deployed, used to select lambda layers	eu-west-2
name	lambda function name
logs_kms_key_arn	kms_key arn to encrypt the lambda cloudwatch log group, (nb. you'll need to allow cloudwatch access to this kms key)	null
scrape_config	json scrape config, see usage below for an example
environment	additional environment variables for the lambda	{}
policy_json	additional policy json attached to the lambda	null
alarm_actions	actions to execute if the lambda triggers an ALARM state (e.g. on error), for example this could be an SNS topi arn to publish to	[]
timeout	lambda function timeout in secons, consider this in relation to your schedule expression, if you're trying to scrape a lot of metrics, you may need to scrape less frequently, to avoid overlapping executions	20
memory_size	lambda function memory allocation in MiB	128
alarm_description	description for an alarm, if null defaults to `"${var.name} invocation error"`	null
schedule_expression	metrics collection schedule .. either cron(...) or rate(..) e.g. rate(1 minute), if null the lambda will not be scheduled	null
enable_lambda_insights	attaches the lambda insights lambda layer	false
lambda_insights_layer_version	layer version from https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versionsx86-64.html	56
queue_arn	SQS queue arn to which the metrics will be delivered
queue_url	SQS queue url (url or the same queue as the `queue_arn` )
max_concurrency	lambda function max concurrency	1

usage

module "aws-metrics" {

  source = "git::https://github.com/NHSDigital/terraform-aws-metrics-lambda?ref=<git-sha>"

  name      = "aws-metrics"
  queue_arn = aws_sqs_queue.aws-metrics.arn
  queue_url = aws_sqs_queue.aws-metrics.url

  timeout = 40

  scrape_config = jsonencode({
    discovery = {
      jobs = [
        {
          type    = "alb"
          regions = ["eu-west-2"]
          search_dimensions = {
            AvailabilityZone = "^$" # exclude az level dimensions
            LoadBalancer     = ".+"
            TargetGroup      = ".+"
          }

          metrics = [
            {
              name  = "HealthyHostCount"
              stats = ["Sum"]
            },
            {
              name  = "UnHealthyHostCount"
              stats = ["Sum"]
            },
            {
              name  = "HTTPCode_Target_2XX_Count"
              stats = ["Sum"]
            },
            {
              name  = "HTTPCode_Target_3XX_Count"
              stats = ["Sum"]
            },
            {
              name  = "HTTPCode_Target_4XX_Count"
              stats = ["Sum"]
            },
            {
              name  = "HTTPCode_Target_5XX_Count"
              stats = ["Sum"]
            }
          ]
        },
        {
          type    = "ecs-containerinsights"
          regions = ["eu-west-2"]
          search_dimensions = {
            ServiceName = ".+"
            ClusterName = ".+"
          }
          metrics = [
            {
              name  = "CpuUtilized"
              stats = ["Maximum"]
            },
            {
              name  = "CpuReserved"
              stats = ["Maximum"]
            },
            {
              name  = "MemoryUtilized"
              stats = ["Maximum"]
            },
            {
              name  = "MemoryReserved"
              stats = ["Maximum"]
            },
            {
              name  = "RunningTaskCount"
              stats = ["Average"]
            },
            {
              name  = "DesiredTaskCount"
              stats = ["Average"]
            },
            {
              name  = "PendingTaskCount"
              stats = ["Average"]
            },
            {
              name  = "NetworkRxBytes"
              stats = ["Average"]
            },
            {
              name  = "NetworkTxBytes"
              stats = ["Average"]
            }
          ]
        },
        {
          type    = "nlb"
          regions = ["eu-west-2"]
          search_dimensions = {
            AvailabilityZone = "^$" # exclude az level dimensions
            LoadBalancer     = ".+"
            TargetGroup      = ".+"
          }

          metrics = [
            {
              name  = "HealthyHostCount"
              stats = ["Sum", "SampleCount"]
            },
            {
              name  = "UnHealthyHostCount"
              stats = ["Sum", "SampleCount"]
            },
            {
              name  = "ActiveFlowCount"
              stats = ["Sum", "SampleCount"]
            }
          ]
        },
        {
          type    = "vpc-endpoint"
          regions = ["eu-west-2"]
          search_dimensions = {
            "Subnet Id" = "^$" # exclude subnet level
          }

          metrics = [
            {
              name  = "ActiveConnections"
              stats = ["Sum", "SampleCount"]
            }
          ]
        },
        {
          type    = "ebs"
          regions = ["eu-west-2"]

          metrics = [
            {
              name  = "VolumeReadBytes"
              stats = ["Sum", "SampleCount"]
            },
            {
              name  = "VolumeWriteBytes"
              stats = ["Sum", "SampleCount"]
            },
            {
              name  = "VolumeReadOps"
              stats = ["Sum", "SampleCount"]
            },
            {
              name  = "VolumeWriteOps"
              stats = ["Sum", "SampleCount"]
            }

          ]
        }
      ]
    }
  })

  schedule_expression = "rate(1 minute)"
  alarm_actions       = [local.resources.sns.alerts.arn]

}

licence

see LICENCE and as a derivative product of YACE also, see APACHE-LICENCE

contributors:

see CONTRIBUTING

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github		.github
scripts		scripts
src		src
tests		tests
.gitallowed		.gitallowed
.gitdisallowed		.gitdisallowed
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.tflint.hcl		.tflint.hcl
.tool-versions		.tool-versions
APACHE-LICENCE.md		APACHE-LICENCE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
cloudwatch.tf		cloudwatch.tf
function.tf		function.tf
iam.tf		iam.tf
outputs.tf		outputs.tf
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
sonar-project.properties		sonar-project.properties
trivy.yaml		trivy.yaml
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Terraform AWS Metrics lambda

variables

usage

licence

contributors:

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Licenses found

NHSDigital/terraform-aws-metrics-lambda

Folders and files

Latest commit

History

Repository files navigation

Terraform AWS Metrics lambda

variables

usage

licence

contributors:

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages