Add statsd metrics collection for Astronomer Cosmos #1102

tatiana · 2025-11-25T11:39:51Z

Details

Currently, Astronomer Cosmos has an overwhelming number of configurations. These were introduced organically over the last three years. Unfortunately, they lead to customer confusion, increased CRE support, and a lot of work for the DevRel team (they wrote and maintain a 100-page eBook!).

We aim to understand how customers use Cosmos configurations to decide what to keep in Cosmos 2.0, planned for next year. This information will also help shape the following features we implement. Given this context, telemetry is set as one of the OSS Build squad's Q4 P1s.

I had a meeting with @stuart23 last week, and he shared a few pointers. I hope this is on the right track.

Related Issues

Request for feedback

This is my first statsd exporter mapping contribution, so I'd love any feedback, in particular:

Do we prefer to use * or regex? I thought regex was safer to avoid over-matching, but it is harder to read. The existing mapping includes both.
I made sure that all labels have low cardinality (the label operator_name is the one with the highest, with around 80 values. All others should have less than 20 values). Is this acceptable?
I tried to track multiple values to avoid further increasing the number of metrics created/expanded. I assumed this would be cost-effective. Do you think this is a good practice? Is it better to break them down into multiple metric definitions?
For the duration metrics, does it make sense to have three definitions - as it is currently implemented - or would it make more sense to have cosmos_rendering_dbt_nodes_parsing_duration, cosmos_rendering_dbt_nodes_filtering_duration and cosmos_rendering_airflow_dag_generation_duration as a single metric, and have the "operation" ("dbt_nodes_parsing", "dbt_nodes_filtering", "rendering_airflow_dag") as a label?

How this was tested

I installed statsd_exporter by cloning the repo and running make build.

In one terminal, I ran the statstd_exporter with the new version of the mapping

statsd_exporter --statsd.mapping-config=statsd-exporter/include/mappings-gen2.yml --log.level=debug

In another terminal, I sent statsd events, and confirmed there were no errors in the statsd_exporter logs:

echo "cosmos.task.operator_name.DbtRunLocalOperator.is_subclass.False.execution_mode.local.invocation_mode.subprocess.dbt_command.run.install_deps.True.origin.DbtTaskGroup.has_callback.False.status.success.counter:1|c"| nc -w1 -u 127.0.0.1 9125

echo "cosmos.profile.database.bigquery.profile_strategy.yaml_file.profile_mapping_class.None.counter:1|c" | nc -w1 -u 127.0.0.1 9125

echo "cosmos.rendering.used_automatic_load_mode.True.actual_load_mode.dbt_ls_cache.invocation_mode.dbt_runner.install_deps.False.uses_node_converter.False.test_behavior.after_each.source_behavior.none.total_dbt_models.100.selected_dbt_models.8:1|c" | nc -w1 -u 127.0.0.1 9125

echo "cosmos.rendering.actual_load_mode.dbt_ls.duration:34500|ms" | nc -w1 -u 127.0.0.1 9125

echo "cosmos.rendering.actual_load_mode.dbt_ls.dbt_nodes_parsing.duration:34500|ms" | nc -w1 -u 127.0.0.1 9125

echo "cosmos.rendering.actual_load_mode.dbt_ls.dbt_nodes_filtering.duration:30|ms" | nc -w1 -u 127.0.0.1 9125

echo "cosmos.rendering.actual_load_mode.dbt_ls.airflow_dag_generation.duration:140|ms" | nc -w1 -u 127.0.0.1 9125

And I confirmed the statsd_exporter logs didn't contain errors:

time=2025-12-08T14:32:33.388Z level=INFO source=main.go:296 msg="Starting StatsD -> Prometheus Exporter" version="(version=0.28.0, branch=master, revision=d63f22b266f72e6d832fbf89bc7341bf625185f6)"
time=2025-12-08T14:32:33.388Z level=INFO source=main.go:297 msg="Build context" context="(go=go1.25.1, platform=darwin/arm64, [email protected], date=20251205-13:20:28, tags=unknown)"
time=2025-12-08T14:32:33.390Z level=INFO source=main.go:346 msg="Accepting StatsD Traffic" udp=:9125 tcp=:9125 unixgram=""
time=2025-12-08T14:32:33.390Z level=INFO source=main.go:347 msg="Accepting Prometheus Requests" addr=:9102
time=2025-12-08T14:32:35.083Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp 

line=cosmos.task.operator_name.DbtRunLocalOperator.is_subclass.False.execution_mode.local.invocation_mode.subprocess.dbt_command.run.install_deps.True.origin.DbtTaskGroup.has_callback.False.status.success.counter:1|c
time=2025-12-08T14:32:35.084Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""


time=2025-12-08T14:32:40.935Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=cosmos.profile.database.bigquery.profile_strategy.yaml_file.profile_mapping_class.None.counter:1|c
time=2025-12-08T14:32:40.935Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""

time=2025-12-08T14:32:45.220Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=cosmos.rendering.used_automatic_load_mode.True.actual_load_mode.dbt_ls_cache.invocation_mode.dbt_runner.install_deps.False.uses_node_converter.False.test_behavior.after_each.source_behavior.none.total_dbt_models.100.selected_dbt_models.8:1|c
time=2025-12-08T14:32:45.220Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""

time=2025-12-08T14:32:48.736Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=cosmos.rendering.actual_load_mode.dbt_ls.duration:34500|ms
time=2025-12-08T14:32:48.736Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""

time=2025-12-08T14:32:52.633Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=cosmos.rendering.actual_load_mode.dbt_ls.dbt_nodes_parsing.duration:34500|ms
time=2025-12-08T14:32:52.633Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""

time=2025-12-08T14:32:56.413Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=cosmos.rendering.actual_load_mode.dbt_ls.dbt_nodes_filtering.duration:30|ms
time=2025-12-08T14:32:56.413Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""

time=2025-12-08T14:33:00.178Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=cosmos.rendering.actual_load_mode.dbt_ls.airflow_dag_generation.duration:140|ms
time=2025-12-08T14:33:00.178Z level=DEBUG source=listener.go:96 msg="Incoming line" proto=udp line=""

statsd-exporter/include/mappings-gen2.yml

jpweber · 2025-12-03T18:03:55Z

Added @astronomer/airflow-infra as reviewers as they consume this image as part what is deployed in dataplanes.

statsd-exporter/include/mappings-gen2.yml

tatiana · 2025-12-05T15:22:39Z

Thanks, @jpweber, for adding @astronomer/airflow-infra as reviewers.
Also, thanks for the review, @pgvishnuram..! I believe I addressed all the comments.
Is there anyone else who should review and eventually approve this? I'd love to take it to the finishing line

Copilot

Pull request overview

This PR adds comprehensive statsd metrics collection for Astronomer Cosmos to understand customer configuration usage patterns and inform Cosmos 2.0 planning. The metrics capture task execution details, profile configurations, and rendering performance characteristics.

Key changes:

Added three counter metrics tracking task execution, profile usage, and rendering configurations
Added three duration metrics tracking parsing, filtering, and DAG generation performance
Incremented statsd-exporter version to reflect the new mappings

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
statsd-exporter/version.txt	Version bump to 0.28.0-4 to reflect new Cosmos metrics
statsd-exporter/include/mappings-gen2.yml	Added six new metric mappings for Cosmos telemetry collection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

statsd-exporter/include/mappings-gen2.yml

Co-authored-by: Copilot <[email protected]>

statsd-exporter/include/mappings-gen2.yml

ianbuss

The only concern I have is that this image is shared by APC. Do APC customers also use Cosmos?

statsd-exporter/include/mappings-gen2.yml

tatiana · 2025-12-08T15:08:27Z

The only concern I have is that this image is shared by APC. Do APC customers also use Cosmos?
@ianbuss thanks a lot for the review!

We proposed these changes in this repo after a call and recommendation from @stuart23 - I didn't realise it was used in multiple parts of our services.

I don't know who the APC customers are, so I don't know which of those use Cosmos. How can we find this information? If not in this repository, in which repo should we define these metrics?

statsd-exporter/version.txt

stuart23 · 2025-12-09T07:40:02Z

statsd-exporter/include/mappings-gen2.yml

+      uses_node_converter: "$5" # True or False
+      test_behavior: "$6" # after_each, after_all, none, build
+      source_behavior: "$7" # all, with_tests_or_freshness, none
+      total_dbt_models: "$8" # Total number of dbt models in the project


Having numerical values as labels isn't really great, because you can't use them in calculations at all, and if they change they end up having an impact on cardinality.

statsd also has a gauge type, could you emit total_dbt_models and selected_dbt_models as gauges please?

Totally makes sense, thanks a lot, @stuart23 , I'll do this

stuart23 · 2025-12-09T07:43:49Z

statsd-exporter/include/mappings-gen2.yml

+  # 2. Durations
+  # These are identified by the suffix ".counter" or ".duration"
+
+  # What is the name of the operator class used to run the task? Did the end-user subclass it?


Just be careful if you're using unsanitized values (e.g. users classes) as labels. Statsd is a very out-of-date and terrible way to move telemetry around and using period delimination for "labels" means that if any label has a period in it, you end up in a mess (e.g. airflow emits mapped tasks or task group tasks as dag.task_group.task_name instead of dag.task_name, and it breaks a lot of these rules. TLDR, just sanitize the labels please!

Thanks a lot, @stuart23. We are confident the label values we're collecting don't have periods, but we'll add some validation in the Cosmos codebase to avoid this from happening over time.

Are there any other better alternatives than period-delimited regex?

Makes sense. Unfortunately this is the only way for now - the vanilla statsd implementation does not officially support labels (ref) so we have to serialize them into the metric name.

stuart23 · 2025-12-09T07:45:29Z

@ianbuss , APC use statsd-exporter/include/mappings.yml or they might mount something over the top of it now. Astro still has the mappings hard coded in mappings-gen2.yaml.

@tatiana thanks for doing the testing with manually pushing statsd metrics to the exporter. There is a small testing framework in https://github.com/astronomer/ap-vendor/tree/main/statsd-exporter/test but if you don't want to do that, can you run it again and then curl http://localhost:9102/metrics (or open it in a browser) after you send your test messages to it? Just would be good to verify those label extracts - you should see them as the metric name followed by key-value pairs, e.g.:

airflow_dagrun_duration{dag_id="5_retries_pass_on_5th",quantile="0.5"} 246.767874
airflow_dagrun_duration{dag_id="5_retries_pass_on_5th",quantile="0.9"} 246.767874
airflow_dagrun_duration{dag_id="5_retries_pass_on_5th",quantile="0.99"} 246.767874

namratachaudhary · 2025-12-09T11:01:13Z

statsd-exporter/include/mappings-gen2.yml

      instance: "$1"
      mount_path: "$2"

+  # ------------------------------------------------------------


This is relying heavily on ([^.]+) which is fine if you control sanitisation, but brittle otherwise.

This goes back to one of my questions in the PR description:

Do we prefer to use * or regex? I thought regex was safer to avoid over-matching, but it is harder to read. The existing mapping includes both.

If we're confident * is a better standard and wouldn't over-match, I'd be happy to remove the regex patterns ([^.]+)

tatiana · 2025-12-18T13:22:19Z

statsd-exporter/include/mappings-gen2.yml

+      dbt_command: "$5" # Example: "run", "build", "test"
+      install_deps: "$6" # True or False
+      origin: "$7" # DbtTaskGroup, DbtDag or StandaloneTask
+      has_callback: "$8" # True or False


Add is_mapped_task so we are consistent with the introduced argument: https://github.com/astronomer/astronomer-cosmos/pull/2195/changes#r2630935633

ianbuss

No objection to adding these, just one comment about the match format.

ianbuss · 2026-01-05T12:28:03Z

statsd-exporter/include/mappings-gen2.yml

+  # What is the name of the operator class used to run the task? Did the end-user subclass it?
+  # Which dbt command was used to run the task?
+  # What execution mode was used? What invocation mode was used?
+  - match: cosmos\.task\.operator_name\.([^.]+)\.is_subclass\.([^.]+)\.execution_mode\.([^.]+)\.invocation_mode\.([^.]+)\.dbt_command\.([^.]+)\.install_deps\.([^.]+)\.origin\.([^.]+)\.has_callback\.([^.]+)\.status\.([^.]+)\.counter$


I think if possible, let's avoid the regex -- we should be able to test it. Also as a general point, do we need the names of the labels in the metric itself? This could probably become something similar to the below:

cosmos.task.counter.*.*.*.*.*.*.*.*.*

tatiana added 2 commits November 25, 2025 11:26

Add statsd metrics collection for Astronomer Cosmos

fdf446a

Update version.txt to check if CI is happier

cdff169

tatiana commented Nov 25, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

pankajkoti self-requested a review November 25, 2025 14:03

tatiana commented Nov 26, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

tatiana requested a review from pankajastro November 26, 2025 10:25

Merge branch 'main' into cosmos-telemetry

8e24e90

pgvishnuram marked this pull request as ready for review December 3, 2025 14:02

pgvishnuram requested review from a team as code owners December 3, 2025 14:02

Merge branch 'main' into cosmos-telemetry

d9765f2

danielhoherd requested review from bamcmanus, hegartyk and jpweber December 3, 2025 14:50

jpweber requested a review from a team December 3, 2025 18:03

jpweber removed request for bamcmanus, hegartyk and jpweber December 3, 2025 18:04

tatiana commented Dec 5, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

tatiana added 2 commits December 5, 2025 10:35

Apply suggestion from @tatiana

9acd5c1

Apply suggestion from @tatiana

89fd8e1

tatiana commented Dec 5, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

tatiana commented Dec 5, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

aa485b3

tatiana commented Dec 5, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Show resolved Hide resolved

Apply suggestion from @tatiana

a094e44

stuart23 requested a review from Copilot December 5, 2025 23:30

Copilot AI reviewed Dec 5, 2025

View reviewed changes

tatiana and others added 3 commits December 8, 2025 09:10

Update statsd-exporter/include/mappings-gen2.yml

cf86ce4

Co-authored-by: Copilot <[email protected]>

Update statsd-exporter/include/mappings-gen2.yml

b1d42be

Co-authored-by: Copilot <[email protected]>

Update statsd-exporter/include/mappings-gen2.yml

e0a78e1

Co-authored-by: Copilot <[email protected]>

tatiana commented Dec 8, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

tatiana added 3 commits December 8, 2025 09:33

Apply suggestions from code review

8bb5dab

Add install_deps to operator and render_config metrics

3a526b0

Merge branch 'main' into cosmos-telemetry

fa1ec05

ianbuss reviewed Dec 8, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

tatiana commented Dec 8, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

Apply suggestion from @tatiana

21da913

tatiana commented Dec 8, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

tatiana commented Dec 8, 2025

View reviewed changes

statsd-exporter/include/mappings-gen2.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

70f2c58

tatiana commented Dec 8, 2025

View reviewed changes

statsd-exporter/version.txt Outdated Show resolved Hide resolved

tatiana added 2 commits December 8, 2025 15:08

Apply suggestions from code review

0aabee2

Merge branch 'main' into cosmos-telemetry

f84e0e9

stuart23 requested changes Dec 9, 2025

View reviewed changes

namratachaudhary reviewed Dec 9, 2025

View reviewed changes

tatiana commented Dec 18, 2025

View reviewed changes

ianbuss reviewed Jan 5, 2026

View reviewed changes

Merge branch 'main' into cosmos-telemetry

043dd58

Add statsd metrics collection for Astronomer Cosmos #1102

Are you sure you want to change the base?

Add statsd metrics collection for Astronomer Cosmos #1102

Uh oh!

Conversation

tatiana commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Related Issues

Request for feedback

How this was tested

Uh oh!

Uh oh!

Uh oh!

jpweber commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tatiana commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ianbuss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tatiana commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuart23 commented Dec 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ianbuss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tatiana commented Nov 25, 2025 •

edited

Loading

tatiana commented Dec 8, 2025 •

edited

Loading