-
Notifications
You must be signed in to change notification settings - Fork 10
Add statsd metrics collection for Astronomer Cosmos #1102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
fdf446a
cdff169
8e24e90
d9765f2
9acd5c1
89fd8e1
aa485b3
a094e44
cf86ce4
b1d42be
e0a78e1
8bb5dab
3a526b0
fa1ec05
21da913
70f2c58
0aabee2
f84e0e9
043dd58
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -268,6 +268,90 @@ mappings: | |
| instance: "$1" | ||
| mount_path: "$2" | ||
|
|
||
| # ------------------------------------------------------------ | ||
| # Cosmos-related metrics | ||
| # ------------------------------------------------------------ | ||
|
|
||
| # The goal with Cosmos metrics is to capture information about how end-users are using Cosmos, so we can plan our roadmap, in particular, the Cosmos 2.0 release. | ||
| # We want to avoid capturing redundant information that is already being collected by other parts of the Astro stack. | ||
|
|
||
| # There are 2 types of Cosmos metrics: | ||
| # 1. Counters | ||
| # 2. Durations | ||
| # These are identified by the suffix ".counter" or ".duration" | ||
|
|
||
| # What is the name of the operator class used to run the task? Did the end-user subclass it? | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just be careful if you're using unsanitized values (e.g. users classes) as labels. Statsd is a very out-of-date and terrible way to move telemetry around and using period delimination for "labels" means that if any label has a period in it, you end up in a mess (e.g. airflow emits mapped tasks or task group tasks as dag.task_group.task_name instead of dag.task_name, and it breaks a lot of these rules. TLDR, just sanitize the labels please!
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot, @stuart23. We are confident the label values we're collecting don't have periods, but we'll add some validation in the Cosmos codebase to avoid this from happening over time. Are there any other better alternatives than period-delimited regex? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. Unfortunately this is the only way for now - the vanilla statsd implementation does not officially support labels (ref) so we have to serialize them into the metric name. |
||
| # Which dbt command was used to run the task? | ||
| # What execution mode was used? What invocation mode was used? | ||
| - match: cosmos\.task\.operator_name\.([^.]+)\.is_subclass\.([^.]+)\.execution_mode\.([^.]+)\.invocation_mode\.([^.]+)\.dbt_command\.([^.]+)\.install_deps\.([^.]+)\.origin\.([^.]+)\.has_callback\.([^.]+)\.status\.([^.]+)\.counter$ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if possible, let's avoid the regex -- we should be able to test it. Also as a general point, do we need the names of the labels in the metric itself? This could probably become something similar to the below: |
||
| match_type: regex | ||
| name: "cosmos_task_overview" | ||
| labels: | ||
| operator_name: "$1" # Example: "DbtRunLocalOperator", "DbtProducerWatcherOperator" | ||
| is_subclass: "$2" # True or False | ||
| execution_mode: "$3" # Example: "local", "virtualenv", "watcher" | ||
| invocation_mode: "$4" # Subprocess or DbtRunner | ||
| dbt_command: "$5" # Example: "run", "build", "test" | ||
| install_deps: "$6" # True or False | ||
| origin: "$7" # DbtTaskGroup, DbtDag or StandaloneTask | ||
| has_callback: "$8" # True or False | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add |
||
| status: "$9" # success, failure | ||
|
|
||
| # Which database did the user run the transformations against? | ||
| # How did the user define the profile, via a ProfileMapping or YAML file? | ||
| # If Profile mapping, which class was used to define the profile? | ||
| - match: cosmos\.profile\.database\.([^.]+)\.profile_strategy\.([^.]+)\.profile_mapping_class\.([^.]+)\.counter$ | ||
| match_type: regex | ||
| name: "cosmos_profile_overview" | ||
| labels: | ||
| database: "$1" # postgres, snowflake, databricks, bigquery, etc | ||
| profile_strategy: "$2" # yaml_file or mapping | ||
| profile_mapping_class: "$3" # None, SnowflakeEncryptedPrivateKeyPemProfileMapping, PostgresUserPasswordProfileMapping, DatabricksTokenProfileMapping, etc | ||
|
|
||
| # If using `DbtDag` or `DbtTaskGroup`, how did the user parse the dbt project? If using dbt_ls, was `dbt` in the same Python virtualenv as Airflow? | ||
| # Did the user specify a custom load converter? | ||
| # What were the source and node behaviors? | ||
| # How many dbt nodes were in the project? | ||
| # How many dbt nodes were selected by the user? | ||
| # How long did the parse operation take? | ||
| # How long did the filtering operation take? | ||
| - match: cosmos\.rendering\.used_automatic_load_mode\.([^.]+)\.actual_load_mode\.([^.]+)\.invocation_mode\.([^.]+)\.install_deps\.([^.]+)\.uses_node_converter\.([^.]+)\.test_behavior\.([^.]+)\.source_behavior\.([^.]+)\.total_dbt_models\.([^.]+)\.selected_dbt_models\.([^.]+)\.counter$ | ||
| match_type: regex | ||
| name: "cosmos_rendering_overview" | ||
| labels: | ||
| used_automatic_load_mode: "$1" # True or False | ||
| actual_load_mode: "$2" # dbt_ls, dbt_ls_cache, dbt_ls_file, dbt_manifest, custom | ||
| invocation_mode: "$3" # subprocess or dbt_runner | ||
| install_deps: "$4" # True or False | ||
| uses_node_converter: "$5" # True or False | ||
| test_behavior: "$6" # after_each, after_all, none, build | ||
| source_behavior: "$7" # all, with_tests_or_freshness, none | ||
| total_dbt_models: "$8" # Total number of dbt models in the project | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having numerical values as labels isn't really great, because you can't use them in calculations at all, and if they change they end up having an impact on cardinality. statsd also has a gauge type, could you emit
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Totally makes sense, thanks a lot, @stuart23 , I'll do this |
||
| selected_dbt_models: "$9" # Total number of dbt models selected by the user | ||
|
|
||
| # How long did Cosmos take to parse the dbt project? | ||
| - match: cosmos\.rendering\.actual_load_mode\.([^.]+)\.dbt_nodes_parsing\.duration$ | ||
| match_type: regex | ||
| name: "cosmos_rendering_dbt_nodes_parsing_duration" | ||
| labels: | ||
| actual_load_mode: "$1" | ||
|
|
||
| # How long did Cosmos take to filter the dbt project after it was parsed? | ||
| - match: cosmos\.rendering\.actual_load_mode\.([^.]+)\.dbt_nodes_filtering\.duration$ | ||
| match_type: regex | ||
| name: "cosmos_rendering_dbt_nodes_filtering_duration" | ||
| labels: | ||
| actual_load_mode: "$1" | ||
|
|
||
| # How long did Cosmos take to build the Airflow DAG dynamically, after the dbt project has been parsed and filtered? | ||
| - match: cosmos\.rendering\.actual_load_mode\.([^.]+)\.airflow_dag_generation\.duration$ | ||
| match_type: regex | ||
| name: "cosmos_rendering_airflow_dag_generation_duration" | ||
| labels: | ||
| actual_load_mode: "$1" | ||
|
|
||
| # ------------------------------------------------------------ | ||
|
|
||
| # drop any metric not matched | ||
| - match: "." | ||
| match_type: regex | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| 0.28.0-3 | ||
| 0.28.0-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is relying heavily on
([^.]+)which is fine if you control sanitisation, but brittle otherwise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to one of my questions in the PR description:
If we're confident
*is a better standard and wouldn't over-match, I'd be happy to remove the regex patterns([^.]+)