Skip to content

Conversation

@Jay-ju
Copy link
Contributor

@Jay-ju Jay-ju commented Jan 19, 2026

Changes Made

Related Issues

@Jay-ju Jay-ju changed the title feat feat(frontend): enhance dashboard UI and fix Ray runner state reporting Jan 19, 2026
@github-actions github-actions bot added the feat label Jan 19, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 19, 2026

Greptile Summary

This PR implements a comprehensive dashboard feature for monitoring Daft query execution across both Ray (Flotilla) and Native (Swordfish) runners. The implementation adds real-time tracking of query lifecycle events, operator-level statistics, and a web-based frontend for visualization.

Key Changes:

  • Backend infrastructure for tracking query states (Pending → Optimizing → Setup → Executing → Finalizing → Finished/Failed/Canceled)
  • Dashboard subscriber system with HTTP API endpoints for state updates and statistics aggregation
  • Frontend improvements including query list table with resizable columns, duration display, entrypoint tracking, and Ray dashboard links
  • Concurrent subscriber notifications using futures::join_all for improved performance (addresses custom rule 30c842e9-0965-4f11-9744-dfa978729589)
  • Worker identification for distributed execution tracking
  • Integration tests for both Ray and Native runners

Issues Found:

  • Inconsistent lock ordering in src/daft-distributed/src/python/dashboard.rs (lines 64-65 vs 54) that could cause deadlock

Confidence Score: 3/5

  • Safe to merge after addressing the lock ordering issue in dashboard.rs
  • The implementation is well-structured with proper error handling, comprehensive state management, and good test coverage. However, the lock ordering issue in the new DashboardStatisticsSubscriber could cause deadlocks in production under concurrent load. Once this is fixed, the PR should be safe to merge.
  • Pay close attention to src/daft-distributed/src/python/dashboard.rs due to the lock ordering issue that needs to be resolved

Important Files Changed

Filename Overview
src/daft-distributed/src/python/dashboard.rs New dashboard statistics subscriber with potential deadlock from inconsistent lock ordering (lines 64-65 vs 54)
src/daft-dashboard/src/engine.rs Added complex state management for query execution with proper error handling and state transitions
src/daft-context/src/subscribers/dashboard.rs Enhanced dashboard subscriber with worker ID support, execution tracking, and improved async error handling
src/daft-local-execution/src/runtime_stats/mod.rs Refactored to use futures::join_all for concurrent subscriber notifications, improving performance
daft/runners/ray_runner.py Added dashboard URL logging, Ray dashboard link construction, and improved query lifecycle notifications

Sequence Diagram

sequenceDiagram
    participant User
    participant Runner as Ray/Native Runner
    participant Context as DaftContext
    participant DashSub as DashboardSubscriber
    participant Engine as Dashboard Engine
    participant Frontend as Dashboard Frontend
    
    User->>Runner: df.collect()
    Runner->>Context: notify_query_start(query_id, metadata)
    Context->>DashSub: on_query_start()
    DashSub->>Engine: POST /engine/query/{id}/start
    Engine->>Frontend: Broadcast query summary
    
    Runner->>Context: notify_optimization_start(query_id)
    Context->>DashSub: on_optimization_start()
    DashSub->>Engine: POST /engine/query/{id}/plan_start
    
    Runner->>Context: notify_optimization_end(query_id, plan)
    Context->>DashSub: on_optimization_end()
    DashSub->>Engine: POST /engine/query/{id}/plan_end
    
    Runner->>Context: notify_exec_start(query_id, physical_plan)
    Context->>DashSub: on_exec_start()
    DashSub->>Engine: POST /engine/query/{id}/exec/start
    
    loop During Execution
        Runner->>Context: notify_exec_operator_start(node_id)
        Context->>DashSub: on_exec_operator_start()
        DashSub->>Engine: POST /engine/query/{id}/exec/{op_id}/start
        
        Runner->>Context: notify_exec_emit_stats(stats)
        Context->>DashSub: on_exec_emit_stats()
        DashSub->>Engine: POST /engine/query/{id}/exec/emit_stats
        Engine->>Frontend: Update operator stats
        
        Runner->>Context: notify_exec_operator_end(node_id)
        Context->>DashSub: on_exec_operator_end()
        DashSub->>Engine: POST /engine/query/{id}/exec/{op_id}/end
    end
    
    Runner->>Context: notify_exec_end(query_id)
    Context->>DashSub: on_exec_end()
    DashSub->>Engine: POST /engine/query/{id}/exec/end
    
    Runner->>Context: notify_query_end(query_id, result)
    Context->>DashSub: on_query_end()
    DashSub->>Engine: POST /engine/query/{id}/end
    Engine->>Frontend: Broadcast final status
    
    Frontend-->>User: Display query results
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

33 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 64 to 65
let mut accumulated = self.operator_stats.lock().unwrap();
let mut started = self.started_operators.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Acquire started lock before accumulated for consistency

Lines 54 and 64-65 acquire these same two locks but in different order. Line 54 acquires only started, while lines 64-65 acquire accumulated first then started. Use consistent lock ordering to prevent potential deadlock.

Suggested change
let mut accumulated = self.operator_stats.lock().unwrap();
let mut started = self.started_operators.lock().unwrap();
let mut started = self.started_operators.lock().unwrap();
let mut accumulated = self.operator_stats.lock().unwrap();
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daft-distributed/src/python/dashboard.rs
Line: 64:65

Comment:
**logic:** Acquire `started` lock before `accumulated` for consistency

Lines 54 and 64-65 acquire these same two locks but in different order. Line 54 acquires only `started`, while lines 64-65 acquire `accumulated` first then `started`. Use consistent lock ordering to prevent potential deadlock.

```suggestion
                let mut started = self.started_operators.lock().unwrap();
                let mut accumulated = self.operator_stats.lock().unwrap();
```

How can I resolve this? If you propose a fix, please make it concise.

@Jay-ju Jay-ju force-pushed the jay/dashboard-frontend branch from 54363ea to bdaf3a1 Compare January 19, 2026 07:19
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.90%. Comparing base (b896dc8) to head (bdaf3a1).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6063      +/-   ##
==========================================
+ Coverage   72.80%   72.90%   +0.10%     
==========================================
  Files         971      973       +2     
  Lines      127119   126166     -953     
==========================================
- Hits        92545    91982     -563     
+ Misses      34574    34184     -390     

see 69 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@universalmind303
Copy link
Member

@Jay-ju, for UI changes, I usually suggest adding some before/after screenshots. It makes it a lot easier and faster for reviewers. Thanks!

@Jay-ju
Copy link
Contributor Author

Jay-ju commented Jan 21, 2026

Before:

image image

After:
image

image image

@universalmind303 I have added the comparison charts before and after. The back-end PR corresponding to this front-end is #6062

cc @srilman As per your request, I have split #6008 into two parts: the front-end and back-end implementations.

@Jay-ju
Copy link
Contributor Author

Jay-ju commented Jan 22, 2026

@universalmind303 Are there any other issues with this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants