feat(dashboard): enhance dashboard UI and fix Ray runner state reporting #6008

Jay-ju · 2026-01-11T12:20:01Z

Frontend Enhancements:
- All Queries Page: Updated table header to use white background (bg-white) with black text and grey separators, improving readability.
- Query Detail Page:
  - Added Entrypoint (command line) and Engine (Swordfish/Flotilla) fields to the metadata section.
  - Added a direct link to the Ray Dashboard for Ray-based queries.
  - Improved metadata visibility by using high-contrast text (text-zinc-100).
  - Progress Table: Refined table headers with dark theme (bg-zinc-800), white text, and clear column separators. Added hover effects for better interactivity.
- Engine Naming: Standardized engine display names (Native -> Swordfish, Ray -> Flotilla).
Backend Fixes & Improvements:
- State Management: Fixed an issue where failed Ray queries were not correctly reporting their terminal state to the dashboard (causing 400 errors). Now allows transitions to Failed state from active states.
- Metadata Propagation: Updated RayRunner to capture and transmit entrypoint and ray_dashboard_url to the dashboard backend.
- Python API: Exposed repr_json on DistributedPhysicalPlan in init.pyi to fix mypy errors and support plan visualization.
Code Cleanup:
- Removed unused imports and debug logging.
- Standardized sys and os imports in ray_runner.py.
- Fixed mypy type definition errors in daft/init.pyi related to context notification methods.

Changes Made

Related Issues

greptile-apps · 2026-01-11T12:25:15Z

Greptile Overview

Greptile Summary

This PR enhances the Daft dashboard with improved UI and fixes critical state reporting issues for Ray queries.

Key Changes

Backend Improvements:

Fixes Ray runner state management by allowing terminal state (Failed/Canceled) transitions from active states (Executing, Setup, Optimizing), resolving 400 errors when queries fail
Changes timestamp precision from u64 to f64 throughout the stack for millisecond-level accuracy
Adds runner, ray_dashboard_url, and entrypoint fields to query metadata for better tracking
Removes the Ray runner restriction from dashboard subscriber
Adds comprehensive query lifecycle notifications (notify_exec_start, notify_exec_end, notify_exec_operator_start, etc.)

Frontend Enhancements:

Adds Duration, Entrypoint, Engine, and Ray UI columns to the queries table
Implements direct Ray Dashboard links for Ray-based queries with job ID appending
Improves table styling with white headers, better borders, and hover effects
Standardizes engine naming (Native → Swordfish, Ray → Flotilla)
Enhances timestamp formatting to show milliseconds

Code Quality:

Exposes repr_json() on DistributedPhysicalPlan (currently returns dummy JSON)
Updates Python type stubs to match new API

Implementation Notes

The core fix addresses a state machine issue where Ray queries that failed couldn't transition to the Failed state, causing backend 400 errors. The solution makes plan_info and exec_info optional in Failed/Canceled states and allows transitions from any active state (lines 330-346 in engine.rs).

The Ray dashboard URL extraction uses ray.worker.get_dashboard_url() and attempts to append the job ID when available, falling back gracefully on errors.

Minor Issues

All findings are non-blocking style/documentation issues (see inline comments for details).

Confidence Score: 4/5

Safe to merge with minor style improvements recommended
The core functionality changes are sound: the state transition fix properly addresses the Ray query failure reporting issue, metadata propagation is implemented consistently across the stack, and frontend changes are purely additive UI enhancements. The timestamp precision change from u64 to f64 is handled correctly throughout. However, there are several minor style issues: inline import in native_runner.py violates project guidelines, misleading comment about commented-out code that actually executes, debug logging left in production code, @ts-ignore suppressing type errors, and undocumented gravitino import removal. These are all non-blocking style/cleanup issues that don't affect correctness.
daft/runners/native_runner.py (inline import and misleading comment), daft/init.py (undocumented gravitino change), src/daft-dashboard/frontend/src/app/queries/page.tsx (@ts-ignore)

Important Files Changed

File Analysis

Filename	Score	Overview
daft/runners/native_runner.py	3/5	Adds entrypoint tracking and query lifecycle notifications; contains inline import violation and misleading comment about code that is actually executing
daft/runners/ray_runner.py	4/5	Adds comprehensive query lifecycle tracking with Ray dashboard URL extraction and proper error handling
daft/init.py	4/5	Comments out gravitino imports (unrelated change not mentioned in PR description)
src/daft-dashboard/src/engine.rs	4/5	Changes timestamps to f64, adds new metadata fields, relaxes state transition requirements for terminal states, includes debug logging
src/daft-dashboard/src/state.rs	5/5	Updates state structs to use f64 timestamps and makes plan_info/exec_info optional for Failed/Canceled states
src/daft-dashboard/frontend/src/app/queries/page.tsx	3/5	Adds new columns for duration, entrypoint, engine, and Ray UI link; includes @ts-ignore for type error

Sequence Diagram

sequenceDiagram
    participant User
    participant Runner as Runner (Native/Ray)
    participant Context as DaftContext
    participant Subscriber as DashboardSubscriber
    participant Backend as Dashboard Backend
    participant Frontend as Dashboard Frontend
    
    User->>Runner: Execute query
    Runner->>Context: _notify_query_start(query_id, metadata)
    Note over Runner: metadata includes runner, entrypoint, ray_dashboard_url
    Context->>Subscriber: on_query_start(query_id, metadata)
    Subscriber->>Backend: POST /query/{id}/start
    Backend->>Frontend: WebSocket update
    
    Runner->>Context: _notify_optimization_start(query_id)
    Context->>Subscriber: on_optimization_start(query_id)
    Subscriber->>Backend: POST /query/{id}/plan/start
    Backend->>Frontend: WebSocket update (status: Optimizing)
    
    Runner->>Runner: Optimize plan
    Runner->>Context: _notify_optimization_end(query_id, optimized_plan)
    Context->>Subscriber: on_optimization_end(query_id, plan)
    Subscriber->>Backend: POST /query/{id}/plan/end
    Backend->>Frontend: WebSocket update (status: Setup)
    
    Runner->>Context: _notify_exec_start(query_id, physical_plan)
    Context->>Subscriber: on_exec_start(query_id, physical_plan)
    Subscriber->>Backend: POST /query/{id}/exec/start
    Backend->>Frontend: WebSocket update (status: Executing)
    
    loop For each result
        Runner->>Context: _notify_exec_emit_stats(query_id, node_id, stats)
        Context->>Subscriber: on_exec_emit_stats(query_id, stats)
        Subscriber->>Backend: POST /query/{id}/exec/op/{op_id}/emit_stats
        Backend->>Frontend: WebSocket update (progress data)
    end
    
    alt Success
        Runner->>Context: _notify_query_end(query_id, Finished)
        Context->>Subscriber: on_query_end(query_id, result)
        Subscriber->>Backend: POST /query/{id}/end (Finished)
        Backend->>Frontend: WebSocket update (status: Finished)
    else Failure
        Runner->>Context: _notify_query_end(query_id, Failed)
        Context->>Subscriber: on_query_end(query_id, result)
        Subscriber->>Backend: POST /query/{id}/end (Failed)
        Note over Backend: Accepts Failed from Executing state
        Backend->>Frontend: WebSocket update (status: Failed)
    end
    
    Frontend->>User: Display query status and Ray dashboard link

greptile-apps

_{5 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-11T12:25:19Z

daft/runners/native_runner.py


        # Optimize the logical plan.
-        ctx._notify_query_start(query_id, PyQueryMetadata(output_schema._schema, builder.repr_json()))
+        import sys


[P2] Import statements should be at the top of the file. Move import sys to the imports section at the beginning of the file (around line 3-4) per the project's import style guidelines.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: daft/runners/native_runner.py Line: 99:99 Comment: [P2] Import statements should be at the top of the file. Move `import sys` to the imports section at the beginning of the file (around line 3-4) per the project's import style guidelines. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-11T12:25:20Z

daft/runners/native_runner.py

+                # Try to emit stats for operator 0?
+                # If NativeExecutor doesn't start operator 0, this might fail or be ignored.
+                # Let's comment it out for now to see baseline behavior.
+                ctx._notify_exec_emit_stats(query_id, 0, {"rows in": total_rows, "rows out": total_rows})


[P3] The comment says "Let's comment it out for now to see baseline behavior" but the code on line 129 is actually executing. Either remove this misleading comment or comment out the code if it was intended to be disabled.

Prompt To Fix With AI

This is a comment left during a code review. Path: daft/runners/native_runner.py Line: 126:129 Comment: [P3] The comment says "Let's comment it out for now to see baseline behavior" but the code on line 129 is actually executing. Either remove this misleading comment or comment out the code if it was intended to be disabled. How can I resolve this? If you propose a fix, please make it concise.

Stats shouldn't be emitted here, they're already emmited within the NativeExecutor itself

greptile-apps · 2026-01-11T12:25:20Z

src/daft-dashboard/src/engine.rs

+    tracing::info!("Received exec_start for query {}", query_id);
    let query_info = state.queries.get_mut(&query_id);
    let Some(mut query_info) = query_info else {
+        tracing::error!("Query {} not found in exec_start", query_id);
        return StatusCode::BAD_REQUEST;
    };
+
+    // Debug state
+    tracing::info!("Query {} state: {:?}", query_id, query_info.state);


[P3] Debug logging statements appear to have been left in from development. The comment "Debug state" at line 181 and tracing::info! calls at lines 174 and 182 look like temporary debugging code. Consider removing or converting to tracing::debug! if this level of verbosity isn't needed in production.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/daft-dashboard/src/engine.rs Line: 174:182 Comment: [P3] Debug logging statements appear to have been left in from development. The comment "Debug state" at line 181 and `tracing::info!` calls at lines 174 and 182 look like temporary debugging code. Consider removing or converting to `tracing::debug!` if this level of verbosity isn't needed in production. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-11T12:25:22Z

src/daft-dashboard/frontend/src/app/queries/page.tsx

+  // @ts-ignore
+  columnHelper.accessor("ray_dashboard_url", {


[P2] Using @ts-ignore suppresses TypeScript type checking instead of fixing the underlying type issue. The ray_dashboard_url property is properly defined as optional in the QuerySummary type. Consider properly typing the accessor or using @ts-expect-error with a specific explanation if this is a known limitation of the column helper library.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/daft-dashboard/frontend/src/app/queries/page.tsx Line: 123:124 Comment: [P2] Using `@ts-ignore` suppresses TypeScript type checking instead of fixing the underlying type issue. The `ray_dashboard_url` property is properly defined as optional in the `QuerySummary` type. Consider properly typing the accessor or using `@ts-expect-error` with a specific explanation if this is a known limitation of the column helper library. How can I resolve this? If you propose a fix, please make it concise.

daft/__init__.py

kevinzwang · 2026-01-12T19:04:27Z

@srilman tagging you on this one

srilman

Could you clarify a couple of points?

daft/__init__.py

daft/io/__init__.py

srilman · 2026-01-13T00:12:40Z

daft/runners/native_runner.py

+                # Try to emit stats for operator 0?
+                # If NativeExecutor doesn't start operator 0, this might fail or be ignored.
+                # Let's comment it out for now to see baseline behavior.
+                ctx._notify_exec_emit_stats(query_id, 0, {"rows in": total_rows, "rows out": total_rows})


Stats shouldn't be emitted here, they're already emmited within the NativeExecutor itself

srilman · 2026-01-13T00:18:13Z

daft/runners/ray_runner.py

+            ):
+                if result.metadata() is not None:
+                    total_rows += result.metadata().num_rows
+                    ctx._notify_exec_emit_stats(query_id, 0, {"rows in": total_rows, "rows out": total_rows})


Similarly, we shouldn't emit stats here as well, because they are expected to be in a specific format per operator

There has been an update here. Could you please check again if it meets the expectations?

srilman · 2026-01-13T00:35:41Z

daft/daft/__init__.pyi

    output_schema: PySchema
    unoptimized_plan: str
+    runner: str
+    ray_dashboard_url: str | None


Why emit the ray dashboard URL?

Here is a link to the ray task

Ah makes sense, good idea

codecov · 2026-01-14T10:46:08Z

Codecov Report

❌ Patch coverage is 24.89209% with 522 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.58%. Comparing base (1d7b41a) to head (0867a8e).
⚠️ Report is 31 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-dashboard/src/engine.rs	0.00%	194 Missing ⚠️
src/daft-context/src/subscribers/dashboard.rs	0.00%	127 Missing ⚠️
src/daft-distributed/src/python/dashboard.rs	0.00%	62 Missing ⚠️
src/daft-context/src/lib.rs	35.71%	54 Missing ⚠️
src/daft-context/src/python.rs	48.71%	40 Missing ⚠️
src/daft-dashboard/src/state.rs	0.00%	15 Missing ⚠️
daft/runners/ray_runner.py	75.47%	13 Missing ⚠️
src/daft-distributed/src/statistics/stats.rs	0.00%	8 Missing ⚠️
src/daft-distributed/src/python/mod.rs	80.00%	3 Missing ⚠️
daft/context.py	80.00%	2 Missing ⚠️
... and 2 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6008      +/-   ##
==========================================
+ Coverage   72.95%   73.58%   +0.62%     
==========================================
  Files         970      972       +2     
  Lines      126744   126949     +205     
==========================================
+ Hits        92471    93416     +945     
+ Misses      34273    33533     -740

Files with missing lines	Coverage Δ
daft/runners/native_runner.py	`84.52% <100.00%> (+0.77%)`	⬆️
src/daft-context/src/subscribers/mod.rs	`62.50% <100.00%> (+22.50%)`	⬆️
src/daft-distributed/src/pipeline_node/mod.rs	`31.77% <100.00%> (+4.49%)`	⬆️
...l-execution/src/runtime_stats/subscribers/query.rs	`91.66% <100.00%> (+2.19%)`	⬆️
daft/context.py	`87.95% <80.00%> (-1.09%)`	⬇️
daft/runners/flotilla.py	`47.39% <50.00%> (-0.25%)`	⬇️
src/daft-local-execution/src/runtime_stats/mod.rs	`91.84% <88.88%> (-0.11%)`	⬇️
src/daft-distributed/src/python/mod.rs	`42.53% <80.00%> (+5.34%)`	⬆️
src/daft-distributed/src/statistics/stats.rs	`32.96% <0.00%> (-3.18%)`	⬇️
daft/runners/ray_runner.py	`68.06% <75.47%> (+0.65%)`	⬆️
... and 6 more

... and 126 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Jay-ju · 2026-01-14T12:37:04Z

@srilman I have updated all your comments. You can check if there are any other issues.

srilman

Just had a couple of clarifying questions

srilman · 2026-01-16T01:18:48Z

daft/runners/native_runner.py

        )

        try:
+            total_rows = 0


Why collect this?

This is a debug code; I deleted something here.

srilman · 2026-01-16T01:19:40Z

daft/runners/ray_runner.py

+        # Log Dashboard URL if configured
+        dashboard_url = os.environ.get("DAFT_DASHBOARD_URL")
+        if dashboard_url:
+            print(f"Daft Dashboard: {dashboard_url}/query/{query_id}")


Remove print

Here, I changed print to logger, mainly to clearly show users how to access the dashboard. What do you think?

srilman · 2026-01-16T01:23:26Z

src/daft-context/src/subscribers/dashboard.rs

-        )))
-        .await?;
+    fn on_exec_start(&self, query_id: QueryID, physical_plan: QueryPlan) -> DaftResult<()> {
+        let execution_id = format!("{}-driver", query_id);


I'm not sure I understand the point of execution_id when its just query_id with a fixed or randomly generated tag, but its only created once. Why not just store query_id directly if its already a random UUID?

Hmm, yes, I agree. The modifications have been made.

tests/test_context.py

srilman · 2026-01-16T01:24:41Z

tests/integration/test_dashboard.py

+from daft import udf
+
+
+@pytest.fixture(scope="module")


How would these tests work without anything to actually launch a dashboard server or mocking the server for testing?

srilman · 2026-01-16T01:25:23Z

src/daft-local-execution/src/runtime_stats/mod.rs

                        for node_id in &active_nodes {
                            let runtime_stats = &node_stats_map[node_id];
-                            let event = runtime_stats.snapshot();
+                            let event = runtime_stats.flush();


This shouldn't flush, that makes it do atomic synchronization which is less efficient

srilman · 2026-01-16T01:27:06Z

src/daft-local-execution/src/runtime_stats/mod.rs

                            );
                        }
+
+                        // Emit final stats to all subscribers before finishing


This shouldn't be necessary, since the finalize_node step should already emit the final stats for that node.

srilman · 2026-01-16T01:29:38Z

also @Jay-ju if possible, could we split this PR into smaller pieces? this modifies a lot of small aspects of observability, and since we're also actively working on it, this pr will end up having a lot of merge conflicts

Jay-ju · 2026-01-21T15:25:21Z

also @Jay-ju if possible, could we split this PR into smaller pieces? this modifies a lot of small aspects of observability, and since we're also actively working on it, this pr will end up having a lot of merge conflicts

@srilman I have split this PR into two PRs: one for the frontend and one for the backend, and I have also resolved some conflicts:
batckend #6008
frontend：#6063

Could you please take another look when you have time?

github-actions bot added the feat label Jan 11, 2026

greptile-apps bot reviewed Jan 11, 2026

View reviewed changes

Jay-ju force-pushed the jay/dashboard-ui-improvements branch from 7b32c5d to 1a47668 Compare January 12, 2026 12:33

kevinzwang requested a review from srilman January 12, 2026 19:04

srilman requested changes Jan 13, 2026

View reviewed changes

Jay-ju force-pushed the jay/dashboard-ui-improvements branch 5 times, most recently from 0b386f5 to 9838c15 Compare January 14, 2026 10:05

Jay-ju force-pushed the jay/dashboard-ui-improvements branch from 9838c15 to 0867a8e Compare January 14, 2026 12:05

srilman reviewed Jan 16, 2026

View reviewed changes

feat(dashboard): enhance dashboard UI and fix Ray runner state reporting

1789f71

Jay-ju force-pushed the jay/dashboard-ui-improvements branch from 0867a8e to 1789f71 Compare January 16, 2026 07:18

Jay-ju added 2 commits January 17, 2026 00:09

tmp

2f4d16d

format

fba83ae

Jay-ju mentioned this pull request Jan 21, 2026

feat(frontend): enhance dashboard UI and fix Ray runner state reporting #6063

Open

feat(dashboard): enhance dashboard UI and fix Ray runner state reporting #6008

Are you sure you want to change the base?

feat(dashboard): enhance dashboard UI and fix Ray runner state reporting #6008

Uh oh!

Conversation

Jay-ju commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Related Issues

Uh oh!

greptile-apps bot commented Jan 11, 2026

Greptile Overview

Greptile Summary

Key Changes

Implementation Notes

Minor Issues

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevinzwang commented Jan 12, 2026

Uh oh!

srilman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jay-ju commented Jan 14, 2026

Uh oh!

srilman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jay-ju commented Jan 11, 2026 •

edited

Loading

codecov bot commented Jan 14, 2026 •

edited

Loading