Skip to content

Commit d8b3ec2

Browse files
authored
Merge pull request #260538 from ambika-garg/airflow-metrics
Airflow metrics
2 parents 66e0c78 + b0cb254 commit d8b3ec2

File tree

1 file changed

+74
-0
lines changed

1 file changed

+74
-0
lines changed

articles/data-factory/how-to-diagnostic-logs-and-metrics-for-managed-airflow.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,4 +103,78 @@ Azure Data Factory offers comprehensive metrics for Airflow Integration Runtimes
103103
6. Click on Save to Dashboard, once your chart is complete, else your chart disappears.
104104
:::image type="content" source="media/diagnostics-logs-and-metrics-for-managed-airflow/save-to-dashboard.png" alt-text="Screenshot that shows save to dashboard." lightbox="media/diagnostics-logs-and-metrics-for-managed-airflow/save-to-dashboard.png":::
105105

106+
## Airflow Metrics
107+
The following table lists the metrics available for the Managed Airflow.
108+
109+
Table headings
110+
111+
Metric - The metric display name as it appears in the Azure portal.
112+
Name in Rest API - Metric name as referred to in the REST API.
113+
Unit - Unit of measure.
114+
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
115+
Dimensions - Dimensions available for the metric.
116+
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
117+
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.
118+
119+
|Metric|Name in REST API|Description|Unit|Aggregation|Dimensions|Time Grains|DS Export|
120+
|---|---|---|---|---|---|---|
121+
|**Airflow Integration Runtime Celery Task Timeout Error** |`AirflowIntegrationRuntimeCeleryTaskTimeoutError` |Number of `AirflowTaskTimeout` errors raised when publishing Task to Celery Broker. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
122+
|**Airflow Integration Runtime Collect DB Dags** |`AirflowIntegrationRuntimeCollectDBDags` |Milliseconds taken for fetching all Serialized Dags from DB. |Milliseconds |Average |`IntegrationRuntimeName`|PT1M |No|
123+
|**Airflow Integration Runtime Cpu Percentage** |`AirflowIntegrationRuntimeCpuPercentage` |CPU usage percentage of the Airflow integration runtime. |Percent |Average |`IntegrationRuntimeName`, `ContainerName`|PT1M |No|
124+
|**Airflow Integration Runtime Memory Usage** |`AirflowIntegrationRuntimeCpuUsage` |Millicores consumed by Airflow Integration Runtime, indicating the CPU resources used in thousandths of a CPU core. |Millicores |Average |`IntegrationRuntimeName`, `ContainerName`|PT1M |Yes|
125+
|**Airflow Integration Runtime Dag Bag Size** |`AirflowIntegrationRuntimeDagBagSize` |Number of DAGs found when the scheduler ran a scan based on its configuration. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
126+
|**Airflow Integration Runtime Dag Callback Exceptions** |`AirflowIntegrationRuntimeDagCallbackExceptions` |Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
127+
|**Airflow Integration Runtime DAG File Refresh Error** |`AirflowIntegrationRuntimeDAGFileRefreshError` |Number of failures loading any DAG files. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
128+
|**Airflow Integration Runtime DAG Processing Import Errors** |`AirflowIntegrationRuntimeDAGProcessingImportErrors` |Number of errors from trying to parse DAG files. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
129+
|**Airflow Integration Runtime DAG Processing Last Duration** |`AirflowIntegrationRuntimeDAGProcessingLastDuration` |Seconds taken to load the given DAG file. |Milliseconds |Average |`IntegrationRuntimeName`, `DagFile`|PT1M |No|
130+
|**Airflow Integration Runtime DAG Processing Last Run Seconds Ago** |`AirflowIntegrationRuntimeDAGProcessingLastRunSecondsAgo` |Seconds since <dag_file> was last processed. |Seconds |Average |`IntegrationRuntimeName`, `DagFile`|PT1M |No|
131+
|**Airflow Integration Runtime DAG ProcessingManager Stalls** |`AirflowIntegrationRuntimeDAGProcessingManagerStalls` |Number of stalled DagFileProcessorManager. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
132+
|**Airflow Integration Runtime DAG Processing Processes** |`AirflowIntegrationRuntimeDAGProcessingProcesses` |Relative number of currently running DAG parsing processes (ie this delta is negative when, since the last metric was sent, processes have completed). |Count |Total |`IntegrationRuntimeName`|PT1M |No|
133+
|**Airflow Integration Runtime DAG Processing Processor Timeouts** |`AirflowIntegrationRuntimeDAGProcessingProcessorTimeouts` |Number of file processors that have been killed due to taking too long. |Seconds |Average |`IntegrationRuntimeName`|PT1M |No|
134+
|**Airflow Integration Runtime DAG Processing Total Parse Time** |`AirflowIntegrationRuntimeDAGProcessingTotalParseTime` |Seconds taken to scan and import dag_processing.file_path_queue_size DAG files. |Seconds |Average |`IntegrationRuntimeName`|PT1M |No|
135+
|**Airflow Integration Runtime DAG Run Dependency Check** |`AirflowIntegrationRuntimeDAGRunDependencyCheck` |Milliseconds taken to check DAG dependencies. |Milliseconds |Average |`IntegrationRuntimeName`, `DagId`|PT1M |No|
136+
|**Airflow Integration Runtime DAG Run Duration Failed** |`AirflowIntegrationRuntimeDAGRunDurationFailed` |Seconds taken for a DagRun to reach failed state. |Milliseconds |Average |`IntegrationRuntimeName`, `DagId`|PT1M |No|
137+
|**Airflow Integration Runtime DAG Run Duration Success** |`AirflowIntegrationRuntimeDAGRunDurationSuccess` |Seconds taken for a DagRun to reach success state. |Milliseconds |Average |`IntegrationRuntimeName`, `DagId`|PT1M |No|
138+
|**Airflow Integration Runtime DAG Run First Task Scheduling Delay** |`AirflowIntegrationRuntimeDAGRunFirstTaskSchedulingDelay` |Seconds elapsed between first task start_date and dagrun expected start. |Milliseconds |Average |`IntegrationRuntimeName`, `DagId`|PT1M |No|
139+
|**Airflow Integration Runtime DAG Run Schedule Delay** |`AirflowIntegrationRuntimeDAGRunScheduleDelay` |Seconds of delay between the scheduled DagRun start date and the actual DagRun start date. |Milliseconds |Average |`IntegrationRuntimeName`, `DagId`|PT1M |No|
140+
|**Airflow Integration Runtime Executor Open Slots** |`AirflowIntegrationRuntimeExecutorOpenSlots` |Number of open slots on executor. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
141+
|**Airflow Integration Runtime Executor Queued Tasks** |`AirflowIntegrationRuntimeExecutorQueuedTasks` |Number of queued tasks on executor. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
142+
|**Airflow Integration Runtime Executor Running Tasks** |`AirflowIntegrationRuntimeExecutorRunningTasks` |Number of running tasks on executor. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
143+
|**Airflow Integration Runtime Job End** |`AirflowIntegrationRuntimeJobEnd` |Number of ended <job_name> job, ex. SchedulerJob, LocalTaskJob. |Count |Total |`IntegrationRuntimeName`, `Job`|PT1M |No|
144+
|**Airflow Integration Runtime Heartbeat Failure** |`AirflowIntegrationRuntimeJobHeartbeatFailure` |Number of failed Heartbeats for a <job_name> job, ex. SchedulerJob, LocalTaskJob. |Count |Total |`IntegrationRuntimeName`, `Job`|PT1M |No|
145+
|**Airflow Integration Runtime Job Start** |`AirflowIntegrationRuntimeJobStart` |Number of started <job_name> job, ex. SchedulerJob, LocalTaskJob. |Count |Total |`IntegrationRuntimeName`, `Job`|PT1M |No|
146+
|**Airflow Integration Runtime Memory Percentage** |`AirflowIntegrationRuntimeMemoryPercentage` |Memory Percentage used by Airflow Integration Runtime environments. |Percent |Average |`IntegrationRuntimeName`, `ContainerName`|PT1M |Yes|
147+
|**Airflow Integration Runtime Node Count** |`AirflowIntegrationRuntimeNodeCount` | |Count |Average |`IntegrationRuntimeName`, `ComputeNodeSize`|PT1M |Yes|
148+
|**Airflow Integration Runtime Operator Failures** |`AirflowIntegrationRuntimeOperatorFailures` |Total Operator failures. |Count |Total |`IntegrationRuntimeName`, `Operator`|PT1M |No|
149+
|**Airflow Integration Runtime Operator Successes** |`AirflowIntegrationRuntimeOperatorSuccesses` |Total Operator successes. |Count |Total |`IntegrationRuntimeName`, `Operator`|PT1M |No|
150+
|**Airflow Integration Runtime Pool Open Slots** |`AirflowIntegrationRuntimePoolOpenSlots` |Number of open slots in the pool. |Count |Total |`IntegrationRuntimeName`, `Pool`|PT1M |No|
151+
|**Airflow Integration Runtime Pool Queued Slots** |`AirflowIntegrationRuntimePoolQueuedSlots` |Number of queued slots in the pool. |Count |Total |`IntegrationRuntimeName`, `Pool`|PT1M |No|
152+
|**Airflow Integration Runtime Pool Running Slots** |`AirflowIntegrationRuntimePoolRunningSlots` |Number of running slots in the pool. |Count |Total |`IntegrationRuntimeName`, `Pool`|PT1M |No|
153+
|**Airflow Integration Runtime Pool Starving Tasks** |`AirflowIntegrationRuntimePoolStarvingTasks` |Number of starving tasks in the pool. |Count |Total |`IntegrationRuntimeName`, `Pool`|PT1M |No|
154+
|**Airflow Integration Runtime Scheduler Critical Section Busy** |`AirflowIntegrationRuntimeSchedulerCriticalSectionBusy` |Count of times a scheduler process tried to get a lock on the critical section (needed to send tasks to the executor) and found it locked by another process. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
155+
|**Airflow Integration Runtime Scheduler Critical Section Duration** |`AirflowIntegrationRuntimeSchedulerCriticalSectionDuration` |Milliseconds spent in the critical section of scheduler loop – only a single scheduler can enter this loop at a time. |Milliseconds |Average |`IntegrationRuntimeName`|PT1M |No|
156+
|**Airflow Integration Runtime Scheduler Failed SLA Email Attempts** |`AirflowIntegrationRuntimeSchedulerFailedSLAEmailAttempts` |Number of failed SLA miss email notification attempts. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
157+
|**Airflow Integration Runtime Scheduler Heartbeats** |`AirflowIntegrationRuntimeSchedulerHeartbeat` |Scheduler heartbeats. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
158+
|**Airflow Integration Runtime Scheduler Orphaned Tasks Adopted** |`AirflowIntegrationRuntimeSchedulerOrphanedTasksAdopted` |Number of Orphaned tasks adopted by the Scheduler. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
159+
|**Airflow Integration Runtime Scheduler Orphaned Tasks Cleared** |`AirflowIntegrationRuntimeSchedulerOrphanedTasksCleared` |Number of Orphaned tasks cleared by the Scheduler. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
160+
|**Airflow Integration Runtime Scheduler Tasks Executable** |`AirflowIntegrationRuntimeSchedulerTasksExecutable` |Number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
161+
|**Airflow Integration Runtime Scheduler Tasks Killed Externally** |`AirflowIntegrationRuntimeSchedulerTasksKilledExternally` |Number of tasks killed externally. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
162+
|**Airflow Integration Runtime Scheduler Tasks Running** |`AirflowIntegrationRuntimeSchedulerTasksRunning` | |Count |Total |`IntegrationRuntimeName`|PT1M |No|
163+
|**Airflow Integration Runtime Scheduler Tasks Starving** |`AirflowIntegrationRuntimeSchedulerTasksStarving` |Number of tasks that cannot be scheduled because of no open slot in pool. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
164+
|**Airflow Integration Runtime Started Task Instances** |`AirflowIntegrationRuntimeStartedTaskInstances` | |Count |Total |`IntegrationRuntimeName`, `DagId`, `TaskId`|PT1M |No|
165+
|**Airflow Integration Runtime Task Instance Created Using Operator** |`AirflowIntegrationRuntimeTaskInstanceCreatedUsingOperator` |Number of tasks instances created for a given Operator. |Count |Total |`IntegrationRuntimeName`, `Operator`|PT1M |No|
166+
|**Airflow Integration Runtime Task Instance Duration** |`AirflowIntegrationRuntimeTaskInstanceDuration` | |Milliseconds |Average |`IntegrationRuntimeName`, `DagId`, `TaskID`|PT1M |No|
167+
|**Airflow Integration Runtime Task Instance Failures** |`AirflowIntegrationRuntimeTaskInstanceFailures` |Overall task instances failures |Count |Total |`IntegrationRuntimeName`|PT1M |No|
168+
|**Airflow Integration Runtime Task Instance Finished** |`AirflowIntegrationRuntimeTaskInstanceFinished` |Overall task instances finished. |Count |Total |`IntegrationRuntimeName`, `DagId`, `TaskId`, `State`|PT1M |No|
169+
|**Airflow Integration Runtime Task Instance Previously Succeeded** |`AirflowIntegrationRuntimeTaskInstancePreviouslySucceeded` |Number of previously succeeded task instances. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
170+
|**Airflow Integration Runtime Task Instance Successes** |`AirflowIntegrationRuntimeTaskInstanceSuccesses` |Overall task instances successes. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
171+
|**Airflow Integration Runtime Task Removed From DAG** |`AirflowIntegrationRuntimeTaskRemovedFromDAG` |Number of tasks removed for a given dag (i.e. task no longer exists in DAG). |Count |Total |`IntegrationRuntimeName`, `DagId`|PT1M |No|
172+
|**Airflow Integration Runtime Task Restored To DAG** |`AirflowIntegrationRuntimeTaskRestoredToDAG` |Number of tasks restored for a given dag (i.e. task instance which was previously in REMOVED state in the DB is added to DAG file). |Count |Total |`IntegrationRuntimeName`, `DagId`|PT1M |No|
173+
|**Airflow Integration Runtime Triggers Blocked Main Thread** |`AirflowIntegrationRuntimeTriggersBlockedMainThread` |Number of triggers that blocked the main thread (likely due to not being fully asynchronous). |Count |Total |`IntegrationRuntimeName`|PT1M |No|
174+
|**Airflow Integration Runtime Triggers Failed** |`AirflowIntegrationRuntimeTriggersFailed` |Number of triggers that errored before they could fire an event. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
175+
|**Airflow Integration Runtime Triggers Running** |`AirflowIntegrationRuntimeTriggersRunning` |Number of triggers currently running for a triggerer (described by hostname). |Count |Total |`IntegrationRuntimeName`|PT1M |No|
176+
|**Airflow Integration Runtime Triggers Succeeded** |`AirflowIntegrationRuntimeTriggersSucceeded` |Number of triggers that have fired at least one event. |Count |Total |`IntegrationRuntimeName`|PT1M |No|
177+
|**Airflow Integration Runtime Zombie Tasks Killed** |`AirflowIntegrationRuntimeZombiesKilled` |Zombie tasks killed |Count |Total |`IntegrationRuntimeName`|PT1M |No|
178+
179+
106180
For more information: [https://learn.microsoft.com/azure/azure-monitor/reference/supported-metrics/microsoft-datafactory-factories-metrics](/azure/azure-monitor/reference/supported-metrics/microsoft-datafactory-factories-metrics)

0 commit comments

Comments
 (0)