Split orchestration and decouple UI/output #20

gleb-chipiga · 2026-01-16T17:24:54Z

Why

The app did synchronous stdout/stderr writes inside async code paths, which can introduce jitter and block the Tokio runtime.
TUI rendering and input handling were running in async tasks and invoked blocking terminal I/O, which can stall worker threads.
Responsibilities (export/save, network info gathering, run orchestration) were mixed into UI/CLI code.

What changed

CLI output (text/json/errors) is routed through a dedicated writer running in spawn_blocking, removing synchronous I/O from hot async paths.
TUI is moved into a dedicated thread; it owns the terminal and performs blocking draw/input without touching Tokio workers.
Event flow uses unbounded Tokio channels to avoid backpressure and task switching in the measurement hot path.
Added a RunCompleted event (boxed) to deliver results to the UI thread cleanly.
Added cancellation status feedback (e.g., “Cancelling…”, “Still cancelling…”) to keep UI responsive during long cancels.
Fixed a potential race where a JoinHandle could be dropped before completion was observed.
Split orchestration into controller (run lifecycle) and post-processing (export/auto-save/history), with UI only sending commands.
Gathered NetworkInfo outside the UI thread and passed it in (UI only consumes results).
Refactored text-mode summaries into a separate builder for presentation-only output.
Shifted Summary phase update to occur right after upload timing ends to avoid UI “stuck on upload” delays.
Text mode output now flushes per line (LineWriter) for real-time progress.

Why these choices

spawn_blocking is the recommended way to perform blocking I/O from async code.
A dedicated UI thread is the safest way to keep terminal I/O fully outside Tokio.
Unbounded channels fit our bounded event volume and avoid artificial throttling of measurements.
Splitting orchestration keeps UI/CLI focused on presentation and reduces cross-layer coupling.

Impact

Measurements are less likely to be affected by output rendering or logging.
UI remains responsive and isolated from async scheduling.
Code paths are clearer: async for network/engine, sync for UI/terminal.
Text mode now renders incremental output reliably.

Run TUI in a dedicated thread, route CLI output through a writer task, split orchestration into controller and post‑process modules, and keep UI/CLI layers focused on presentation with unbounded channels and early Summary phase updates

kavehtehrani · 2026-01-16T20:50:52Z

Thanks for the detailed PR. I have some concerns about the changes though.

TUI thread change: The original code used crossterm::event::EventStream which is already async native I believe, it wasn't blocking tokio workers. Moving to a dedicated OS thread adds complexity (thread spawning, joining, extra channels) without clear benefit to me. Could you please let me know why you did this?
Output writer: Routing println!/eprintln! through a channel + spawn_blocking is overkill for a CLI that prints a few lines and its default operating mode is likely not even text, but rather the TUI + json output if headless. stdout writes for small outputs aren't meaningfully blocking.
Unbounded channels: Changing from bounded to unbounded channels removes backpressure which as a habit of mine unless required otherwise I generally put. Given that this app is exchanging only a few events per run I'm not sure there is any reason to make the channel unbounded.
Added complexity: The new modules (orchestrator, text_summary), parameter structs (LatencyProbeParams, ChartRenderParams), and restructuring add a few hundred lines and more indirection. This is adding a ton of abstraction and I'm not sure the extra complexity is warranted for a simple tool that would benefit from simplicity.

I'd like to understand what specific issue you observed that motivated these changes. If there was actual UI jitter or blocked workers, I'd love to see that addressed in a more targeted way. Could you please provide a controlled test where it demonstrates that there is jitter or the worker threads were actually stalled?

Much appreciated,
Kaveh

gleb-chipiga · 2026-01-17T09:15:58Z

Hi Kaveh, thanks for the thoughtful review.

TUI thread change: The original code used crossterm::event::EventStream which is already async native I believe, it wasn't blocking tokio workers. Moving to a dedicated OS thread adds complexity (thread spawning, joining, extra channels) without clear benefit to me. Could you please let me know why you did this?

The motivation wasn’t EventStream; it was the synchronous terminal operations (especially Terminal::draw). Those block the current thread, and if UI runs inside an async task it blocks that task and can add jitter to other tasks sharing that worker. Moving UI to a dedicated OS thread is the strict way to keep Tokio free of such blocking calls. Yes, it adds complexity, but it’s a deliberate trade‑off toward the goal of “zero blocking in async tasks” to minimize UI impact on measurements.

Output writer: Routing println!/eprintln! through a channel + spawn_blocking is overkill for a CLI that prints a few lines and its default operating mode is likely not even text, but rather the TUI + json output if headless. stdout writes for small outputs aren't meaningfully blocking.

This was driven by worst‑case rather than average‑case. stdout/stderr are synchronous, and under slow downstream (pty/ssh/tmux) write() can block. For measurements that’s a jitter risk even if it’s “usually fast”. So output was moved to a dedicated writer to keep blocking I/O out of async hot paths and preserve timing stability.

Unbounded channels: Changing from bounded to unbounded channels removes backpressure which as a habit of mine unless required otherwise I generally put. Given that this app is exchanging only a few events per run I'm not sure there is any reason to make the channel unbounded.

Unbounded was a deliberate choice: the event volume is strictly bounded by durations/intervals (hundreds per run) and that’s deterministic. When the volume is small and guaranteed, unbounded doesn’t pose a practical OOM risk and is appropriate—while avoiding backpressure and send().await stalls on the hot measurement path.

Added complexity: The new modules (orchestrator, text_summary), parameter structs (LatencyProbeParams, ChartRenderParams), and restructuring add a few hundred lines and more indirection. This is adding a ton of abstraction and I'm not sure the extra complexity is warranted for a simple tool that would benefit from simplicity.

We increased modularity as a best practice to reduce coupling. Orchestration and post‑processing were separated so UI/CLI remain presentation‑only while domain logic and run lifecycle don’t depend on the interface. This lowers coupling, improves testability, and makes future changes safer (e.g., swap UI or storage/export without touching measurements). It’s not abstraction for its own sake, but deliberate separation of responsibilities for maintainability.

I'd like to understand what specific issue you observed that motivated these changes. If there was actual UI jitter or blocked workers, I'd love to see that addressed in a more targeted way. Could you please provide a controlled test where it demonstrates that there is jitter or the worker threads were actually stalled?

A controlled scenario is to run under tmux/ssh and enter tmux copy‑mode during the test. In that mode tmux stops actively reading from the PTY and stops refreshing the screen, so the output buffer fills up. Once the buffer is full, normal write() calls to stdout/stderr block, and the writing thread stalls on write(). A similar effect can happen over slow SSH links: if the remote terminal/network can’t keep up with output, the PTY buffer fills and write() starts blocking. This is reproducible and affects timing, showing that “usually fast” doesn’t guarantee non‑blocking output under slow downstream conditions.

kavehtehrani · 2026-01-17T13:12:36Z

Thanks for sending the pr. I did provide a sample code snippet in your other pr that would be helpful in demonstrating actual potential issues. Unfortunately this seems too theoretical and I"m guessing llm-generated which is more concerned with "correctness" rather than actual real world utility. This is almost 1000 lines of code for things that you can measure yourself and I'd be happy to take feedback if an actual problem was demonstrated, for instance terminal::draw takes microseconds and isn't blocking anything that would be noticeable by a human.

gleb-chipiga force-pushed the output-offload-tui-thread branch from bf361dd to 4badae4 Compare January 16, 2026 18:05

gleb-chipiga changed the title ~~Decouple output and move TUI off Tokio~~ Decouple output, move TUI off Tokio, and centralize orchestration Jan 16, 2026

gleb-chipiga force-pushed the output-offload-tui-thread branch 6 times, most recently from a6e277a to 3c786ad Compare January 16, 2026 18:51

gleb-chipiga changed the title ~~Decouple output, move TUI off Tokio, and centralize orchestration~~ Split orchestration and decouple UI/output Jan 16, 2026

Split orchestration and decouple UI/output

2bf3174

Run TUI in a dedicated thread, route CLI output through a writer task, split orchestration into controller and post‑process modules, and keep UI/CLI layers focused on presentation with unbounded channels and early Summary phase updates

gleb-chipiga force-pushed the output-offload-tui-thread branch from 3c786ad to 2bf3174 Compare January 16, 2026 18:56

kavehtehrani closed this Jan 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split orchestration and decouple UI/output #20

Split orchestration and decouple UI/output #20

Uh oh!

gleb-chipiga commented Jan 16, 2026 •

edited

Loading

Uh oh!

kavehtehrani commented Jan 16, 2026 •

edited

Loading

Uh oh!

gleb-chipiga commented Jan 17, 2026 •

edited

Loading

Uh oh!

kavehtehrani commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Split orchestration and decouple UI/output #20

Split orchestration and decouple UI/output #20

Uh oh!

Conversation

gleb-chipiga commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Why these choices

Impact

Uh oh!

kavehtehrani commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gleb-chipiga commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kavehtehrani commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gleb-chipiga commented Jan 16, 2026 •

edited

Loading

kavehtehrani commented Jan 16, 2026 •

edited

Loading

gleb-chipiga commented Jan 17, 2026 •

edited

Loading