[Core] Fix task name inconsistency in RUNNING vs FINISHED metrics (#59893)

yuanjiewei · Benjaminyuan · web-flow · commit be77f0aa7669 · 2026-01-07T09:23:56.000-06:00
## Description

Fix inconsistent task name in metrics between RUNNING and FINISHED
states.

When a Ray task is defined with a custom name via
`.options(name="custom_name")`, the `ray_tasks` metrics show
inconsistent names:
- **RUNNING** state: shows the original function name (e.g., `RemoteFn`)
- **FINISHED/FAILED** state: shows the custom name (e.g., `test`)

**Root cause:** The RUNNING task counter in `CoreWorker` uses
`FunctionDescriptor()-&gt;CallString()` to get the task name, while
finished task events correctly use `TaskSpecification::GetName()`.

**Fix:** Changed both `HandlePushTask` and `ExecuteTask` in
`core_worker.cc` to use `task_spec.GetName()` consistently, which
properly returns the custom name when set.

## Related issues

None - this PR addresses a newly discovered bug.

## Additional information

**Files changed:**
- `src/ray/core_worker/core_worker.cc` - Use `GetName()` instead of
`FunctionDescriptor()-&gt;CallString()` for metrics
- `python/ray/tests/test_task_metrics.py` - Added test
`test_task_custom_name_metrics` to verify custom names appear correctly
in metrics

Signed-off-by: Yuan Jiewei &lt;jieweihh.yuan@gmail.com&gt;
Co-authored-by: Yuan Jiewei &lt;jieweihh.yuan@gmail.com&gt;
diff --git a/python/ray/tests/test_task_metrics.py b/python/ray/tests/test_task_metrics.py
@@ -109,6 +109,60 @@ def c():
     proc.kill()
 
 
+@pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows.")
+def test_task_custom_name_metrics(shutdown_only):
+    """Verify that custom task names set via .options(name=...) are used in metrics.
+
+    This tests that RUNNING tasks use the custom name consistently with
+    FINISHED/FAILED tasks. Previously there was a bug where RUNNING metrics used
+    the function name (FunctionDescriptor->CallString()) but FINISHED/FAILED used
+    the custom name (TaskSpec::GetName()).
+    """
+    info = ray.init(num_cpus=2, **METRIC_CONFIG)
+
+    driver = """
+import ray
+import time
+
+ray.init("auto")
+
+@ray.remote
+def my_function():
+    time.sleep(999)
+
+# Submit tasks with custom names
+a = [my_function.options(name="custom_task_name").remote() for _ in range(4)]
+ray.get(a)
+"""
+    proc = run_string_as_driver_nonblocking(driver)
+    timeseries = PrometheusTimeseries()
+
+    # Verify that RUNNING tasks use the custom name, not the function name.
+    # With 2 CPUs, 2 tasks should be running and 2 should be pending.
+    expected = {
+        ("custom_task_name", "RUNNING"): 2.0,
+        ("custom_task_name", "PENDING_NODE_ASSIGNMENT"): 2.0,
+    }
+    wait_for_condition(
+        lambda: tasks_by_name_and_state(info, timeseries) == expected,
+        timeout=20,
+        retry_interval_ms=500,
+    )
+
+    # Verify the original function name is NOT used in metrics
+    breakdown = tasks_by_name_and_state(info, timeseries)
+    assert (
+        "my_function",
+        "RUNNING",
+    ) not in breakdown, "RUNNING tasks should use custom name, not function name"
+    assert (
+        "my_function",
+        "PENDING_NODE_ASSIGNMENT",
+    ) not in breakdown, "PENDING tasks should use custom name, not function name"
+
+    proc.kill()
+
+
 def test_task_job_ids(shutdown_only):
     info = ray.init(num_cpus=2, **METRIC_CONFIG)
     timeseries = PrometheusTimeseries()
diff --git a/src/ray/core_worker/core_worker.cc b/src/ray/core_worker/core_worker.cc
@@ -2807,8 +2807,11 @@ Status CoreWorker::ExecuteTask(
   // about any IDs that we are still borrowing by the time the task completes.
   std::vector<ObjectID> borrowed_ids;
 
-  // Extract function name and retry status for metrics reporting.
-  std::string func_name = task_spec.FunctionDescriptor()->CallString();
+  // Extract task name and retry status for metrics reporting.
+  // Use GetName() which returns the custom task name if set via .options(name="..."),
+  // otherwise falls back to the function descriptor's call string. This ensures
+  // consistency with task events reported to the State API / Dashboard.
+  std::string func_name = task_spec.GetName();
   bool is_retry = task_spec.IsRetry();
 
   ++num_get_pin_args_in_flight_;
@@ -3434,10 +3437,10 @@ void CoreWorker::HandlePushTask(rpc::PushTaskRequest request,
   }
 
   // Increment the task_queue_length and per function counter.
+  // Use task name which includes custom name from .options(name="...") if set,
+  // ensuring consistency with task events reported to the State API / Dashboard.
   task_queue_length_ += 1;
-  std::string func_name =
-      FunctionDescriptorBuilder::FromProto(request.task_spec().function_descriptor())
-          ->CallString();
+  std::string func_name = request.task_spec().name();
   task_counter_.IncPending(func_name, request.task_spec().attempt_number() > 0);
 
   // For actor tasks, we just need to post a HandleActorTask instance to the task