You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/apache-airflow/core-concepts/tasks.rst
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -165,35 +165,35 @@ If you want to control your task's state from within custom Task/Operator code,
165
165
166
166
These can be useful if your code has extra knowledge about its environment and wants to fail/skip faster - e.g., skipping when it knows there's no data available, or fast-failing when it detects its API key is invalid (as that will not be fixed by a retry).
167
167
168
-
.. _concepts:zombies:
168
+
.. _concepts:task-instance-heartbeat-timeout:
169
169
170
-
Zombie Tasks
171
-
------------
170
+
Task Instance Heartbeat Timeout
171
+
-------------------------------
172
172
173
173
No system runs perfectly, and task instances are expected to die once in a while.
174
174
175
-
*Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
176
-
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
177
-
periodically, clean them up, and either fail or retry the task depending on its settings. Tasks can become zombies for
175
+
``TaskInstances`` may get stuck in a ``running`` state despite their associated jobs being inactive
176
+
(for example if the ``TaskInstance``'s worker ran out of memory). Such tasks were formerly known as zombie tasks. Airflow will find these
177
+
periodically, clean them up, and mark the ``TaskInstance`` as failed or retry it if it has available retries. The ``TaskInstance``'s heartbeat can timeout for
178
178
many reasons, including:
179
179
180
180
* The Airflow worker ran out of memory and was OOMKilled.
181
181
* The Airflow worker failed its liveness probe, so the system (for example, Kubernetes) restarted the worker.
182
182
* The system (for example, Kubernetes) scaled down and moved an Airflow worker from one node to another.
Copy file name to clipboardExpand all lines: docs/apache-airflow/faq.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -453,7 +453,7 @@ Why did my task fail with no logs in the UI?
453
453
Logs are :ref:`typically served when a task reaches a terminal state <serving-worker-trigger-logs>`. Sometimes, a task's normal lifecycle is disrupted, and the task's
454
454
worker is unable to write the task's logs. This typically happens for one of two reasons:
2. Tasks failed after getting stuck in queued (Airflow 2.6.0+). Tasks that are in queued for longer than :ref:`scheduler.task_queued_timeout <config:scheduler__task_queued_timeout>` will be marked as failed, and there will be no task logs in the Airflow UI.
458
458
459
459
Setting retries for each task drastically reduces the chance that either of these problems impact a workflow.
Copy file name to clipboardExpand all lines: docs/apache-airflow/troubleshooting.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ Below are some example scenarios that could cause a task's state to change by a
32
32
33
33
- If a task's DAG failed to parse on the worker, the scheduler may mark the task as failed. If confirmed, consider increasing :ref:`core.dagbag_import_timeout <config:core__dagbag_import_timeout>` and :ref:`dag_processor.dag_file_processor_timeout <config:dag_processor__dag_file_processor_timeout>`.
34
34
- The scheduler will mark a task as failed if the task has been queued for longer than :ref:`scheduler.task_queued_timeout <config:scheduler__task_queued_timeout>`.
35
-
- If a task becomes a :ref:`zombie <concepts:zombies>`, it will be marked failed by the scheduler.
35
+
- If a :ref:`task instance's heartbeat times out <concepts:task-instance-heartbeat-timeout>`, it will be marked failed by the scheduler.
36
36
- A user marked the task as successful or failed in the Airflow UI.
37
37
- An external script or process used the :doc:`Airflow REST API <stable-rest-api-ref>` to change the state of a task.
38
38
@@ -45,4 +45,4 @@ Here are some examples that could cause such an event:
45
45
46
46
- A DAG run timeout, specified by ``dagrun_timeout`` in the DAG's definition.
47
47
- An Airflow worker running out of memory
48
-
- Usually, Airflow workers that run out of memory receive a SIGKILL and are marked as a zombie and failed by the scheduler. However, in some scenarios, Airflow kills the task before that happens.
48
+
- Usually, Airflow workers that run out of memory receive a SIGKILL, and the scheduler will fail the corresponding task instance for not having a heartbeat. However, in some scenarios, Airflow kills the task before that happens.
0 commit comments