Merged
Conversation
Adds a Training/Inference toggle in Step 1 of the job submission wizard. When Inference is selected, the config upload step is skipped and the user goes directly to a file selection view where they browse the worker filesystem for model checkpoint directories and a .slp data file. Dashboard changes: - Job type toggle (Training/Inference) in Step 1 - Step indicator adapts: 3 steps for training, 2 for inference - New inference file selection view with model cards and data input - File browser reused from training flow with inference-specific routing - submitJob() sends TrackJobSpec (type: "track") for inference - Status view hides epoch/metrics and Stop Early for inference - Status label shows "Running inference..." for track jobs No worker changes needed — TrackJobSpec, build_track_command(), and RelayChannel inference message handling all already exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r to dashboard File browser UX: - Replaced double-click model selection with explicit "Select Folder" / "Select File" button that appears when an item is clicked - Button confirms the selection and closes the file browser Inference log forwarding: - Track (inference) jobs now forward stderr lines to the relay channel so inference progress appears in the dashboard worker logs - RelayChannel handles [stderr] lines as running status with message - Dashboard _sjHandleJobStatus appends message to worker log panel Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inference output: - Track jobs now use stderr=STDOUT to merge all output into one stream. Rich/tqdm suppress output when stderr is a pipe (not TTY), but merging into stdout captures everything through the existing stdout reader which already forwards to the relay channel. - Skip stderr streaming task when stderr is merged (prevents crash) File browser: - Add sj-inference-file-browser div to HTML (was dynamically created but not found by column renderer) - Simplify browse handlers since container exists in HTML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…avigation - Don't reset _sjSelectedInferencePath when loading subfolder contents (only reset on fresh browse at colIndex 0) - Model browsing: clicking a folder shows Select Folder button AND navigates into it — button persists while browsing deeper - Data browsing: clicking a folder only navigates, no Select button shown — only .slp files get the Select File button Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RelayChannel only forwarded known message types (JOB_ACCEPTED, JOB_PROGRESS, INFERENCE_BEGIN, etc). Plain text lines from inference stdout (e.g., "Started inference at...") and CR:: tqdm progress updates fell through and were silently dropped. Added catch-all handler that forwards any unrecognized text line as a running status with message field, so all output appears in the dashboard worker logs. Also forwards CR:: tqdm progress updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inference logs arriving via _sjHandleJobStatus with status=running and a message field were appended to the visible log panel but not saved to the activeJobs.logs array. This meant the Training Job Summary showed no logs after inference completed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alicup29
added a commit
that referenced
this pull request
Apr 2, 2026
Training jobs were flooding the dashboard with config dumps, model architecture, and other startup logs because the catch-all text line handler (added in PR #76 for inference stdout) forwarded everything. Now only inference/track jobs forward unrecognized text lines. Training jobs use PROGRESS_REPORT:: for epoch-level updates as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alicup29
added a commit
that referenced
this pull request
Apr 2, 2026
Lightning's tqdm progress bar outputs via stderr and carriage returns during training, flooding the relay with per-batch updates. These handlers (added in PR #76 for inference stdout) should only forward for track/inference jobs. Training uses PROGRESS_REPORT:: epoch-level events instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alicup29
added a commit
that referenced
this pull request
Apr 2, 2026
* Add E2E encryption for relay transport (ECDH P-256 + AES-256-GCM) Encrypts all relay messages between dashboard and workers so the signaling server cannot read message payloads. Uses ephemeral ECDH P-256 key exchange + HKDF + AES-256-GCM with zero new dependencies. Python (worker): - New sleap_rtc/encryption/ module (ecdh.py, envelope.py) - mesh_coordinator: key exchange handler, decrypt incoming, encrypt outgoing - RelayChannel.send() encrypts job status/progress when E2E session active - Session key storage with 24h pruning JavaScript (dashboard): - Web Crypto API: ECDH P-256 key generation, HKDF, AES-GCM encrypt/decrypt - Key exchange initiated on "Next →" (sjEnterStep3 / sjGoToInferenceStep) - apiWorkerMessage() transparently encrypts when E2E is ready - apiFsList/apiJobSubmit rerouted through encrypted path - sseConnect() decrypts encrypted_relay events and re-dispatches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix E2E encryption: re-apply mesh_coordinator changes + fix job_id generation - Re-apply all E2E encryption integration to mesh_coordinator.py (key exchange handler, decrypt incoming, encrypt outgoing, session management, _send_relay_response helper) - Fix apiJobSubmit: generate job_id client-side when E2E is active, since the dedicated /api/jobs/submit endpoint is bypassed to avoid exposing config to the signaling server - Re-apply apiFsList E2E routing through apiWorkerMessage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix crash in _sjRenderWorkerList when activeJobs has undefined keys Guard against undefined job IDs in the debug log line. This can happen when a previous E2E test stored an undefined job_id in activeJobs (persisted in localStorage). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Limit RelayChannel catch-all to inference jobs only Training jobs were flooding the dashboard with config dumps, model architecture, and other startup logs because the catch-all text line handler (added in PR #76 for inference stdout) forwarded everything. Now only inference/track jobs forward unrecognized text lines. Training jobs use PROGRESS_REPORT:: for epoch-level updates as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Gate [stderr] and CR:: relay handlers to inference jobs only Lightning's tqdm progress bar outputs via stderr and carriage returns during training, flooding the relay with per-batch updates. These handlers (added in PR #76 for inference stdout) should only forward for track/inference jobs. Training uses PROGRESS_REPORT:: epoch-level events instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Re-add E2E encryption JS code stripped by linter The linter repeatedly strips the E2E encryption code from app.js. Re-adds all required components: - Constructor state variables (_e2eSessionId, _e2ePrivateKey, etc.) - Web Crypto functions (keypair gen, ECDH, HKDF, AES-GCM encrypt/decrypt) - Key exchange orchestration with timeout + retry - SSE decryption handler in sseConnect() - Transparent encryption in apiWorkerMessage() - Key exchange triggers in sjEnterStep3() and sjGoToInferenceStep() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Format mesh_coordinator.py with Black Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds inference (track) job submission to the dashboard, letting users run sleap-nn inference on remote workers directly from the web UI.
Motivation
Previously the dashboard only supported training job submission. Users who wanted to run inference on trained models had to use the SLEAP GUI, sleap-app, or the CLI. This PR enables the full inference workflow from the dashboard: select models, select data, submit, and monitor — matching the training job experience.
Key Changes
Dashboard — Job type selector
Dashboard — Inference file selection
Dashboard — Inference progress
Worker — Inference output forwarding
stderr=STDOUT) so rich/tqdm output is capturedFiles changed
dashboard/app.jsdashboard/index.htmldashboard/styles.csssleap_rtc/worker/job_executor.pysleap_rtc/worker/mesh_coordinator.pyTest plan
🤖 Generated with Claude Code