Skip to content

Conversation

@gleb-chipiga
Copy link

@gleb-chipiga gleb-chipiga commented Jan 16, 2026

Why

  • The app did synchronous stdout/stderr writes inside async code paths, which can introduce jitter and block the Tokio runtime.
  • TUI rendering and input handling were running in async tasks and invoked blocking terminal I/O, which can stall worker threads.
  • Responsibilities (export/save, network info gathering, run orchestration) were mixed into UI/CLI code.

What changed

  • CLI output (text/json/errors) is routed through a dedicated writer running in spawn_blocking, removing synchronous I/O from hot async paths.
  • TUI is moved into a dedicated thread; it owns the terminal and performs blocking draw/input without touching Tokio workers.
  • Event flow uses unbounded Tokio channels to avoid backpressure and task switching in the measurement hot path.
  • Added a RunCompleted event (boxed) to deliver results to the UI thread cleanly.
  • Added cancellation status feedback (e.g., “Cancelling…”, “Still cancelling…”) to keep UI responsive during long cancels.
  • Fixed a potential race where a JoinHandle could be dropped before completion was observed.
  • Split orchestration into controller (run lifecycle) and post-processing (export/auto-save/history), with UI only sending commands.
  • Gathered NetworkInfo outside the UI thread and passed it in (UI only consumes results).
  • Refactored text-mode summaries into a separate builder for presentation-only output.
  • Shifted Summary phase update to occur right after upload timing ends to avoid UI “stuck on upload” delays.
  • Text mode output now flushes per line (LineWriter) for real-time progress.

Why these choices

  • spawn_blocking is the recommended way to perform blocking I/O from async code.
  • A dedicated UI thread is the safest way to keep terminal I/O fully outside Tokio.
  • Unbounded channels fit our bounded event volume and avoid artificial throttling of measurements.
  • Splitting orchestration keeps UI/CLI focused on presentation and reduces cross-layer coupling.

Impact

  • Measurements are less likely to be affected by output rendering or logging.
  • UI remains responsive and isolated from async scheduling.
  • Code paths are clearer: async for network/engine, sync for UI/terminal.
  • Text mode now renders incremental output reliably.

@gleb-chipiga gleb-chipiga force-pushed the output-offload-tui-thread branch from bf361dd to 4badae4 Compare January 16, 2026 18:05
@gleb-chipiga gleb-chipiga changed the title Decouple output and move TUI off Tokio Decouple output, move TUI off Tokio, and centralize orchestration Jan 16, 2026
@gleb-chipiga gleb-chipiga force-pushed the output-offload-tui-thread branch 6 times, most recently from a6e277a to 3c786ad Compare January 16, 2026 18:51
@gleb-chipiga gleb-chipiga changed the title Decouple output, move TUI off Tokio, and centralize orchestration Split orchestration and decouple UI/output Jan 16, 2026
Run TUI in a dedicated thread, route CLI output through a writer task,
split orchestration into controller and post‑process modules, and keep
UI/CLI layers focused on presentation with unbounded channels and early
Summary phase updates
@gleb-chipiga gleb-chipiga force-pushed the output-offload-tui-thread branch from 3c786ad to 2bf3174 Compare January 16, 2026 18:56
@kavehtehrani
Copy link
Owner

kavehtehrani commented Jan 16, 2026

Thanks for the detailed PR. I have some concerns about the changes though.

  • TUI thread change: The original code used crossterm::event::EventStream which is already async native I believe, it wasn't blocking tokio workers. Moving to a dedicated OS thread adds complexity (thread spawning, joining, extra channels) without clear benefit to me. Could you please let me know why you did this?

  • Output writer: Routing println!/eprintln! through a channel + spawn_blocking is overkill for a CLI that prints a few lines and its default operating mode is likely not even text, but rather the TUI + json output if headless. stdout writes for small outputs aren't meaningfully blocking.

  • Unbounded channels: Changing from bounded to unbounded channels removes backpressure which as a habit of mine unless required otherwise I generally put. Given that this app is exchanging only a few events per run I'm not sure there is any reason to make the channel unbounded.

  • Added complexity: The new modules (orchestrator, text_summary), parameter structs (LatencyProbeParams, ChartRenderParams), and restructuring add a few hundred lines and more indirection. This is adding a ton of abstraction and I'm not sure the extra complexity is warranted for a simple tool that would benefit from simplicity.

I'd like to understand what specific issue you observed that motivated these changes. If there was actual UI jitter or blocked workers, I'd love to see that addressed in a more targeted way. Could you please provide a controlled test where it demonstrates that there is jitter or the worker threads were actually stalled?

Much appreciated,
Kaveh

@gleb-chipiga
Copy link
Author

gleb-chipiga commented Jan 17, 2026

Hi Kaveh, thanks for the thoughtful review.

TUI thread change: The original code used crossterm::event::EventStream which is already async native I believe, it wasn't blocking tokio workers. Moving to a dedicated OS thread adds complexity (thread spawning, joining, extra channels) without clear benefit to me. Could you please let me know why you did this?

The motivation wasn’t EventStream; it was the synchronous terminal operations (especially Terminal::draw). Those block the current thread, and if UI runs inside an async task it blocks that task and can add jitter to other tasks sharing that worker. Moving UI to a dedicated OS thread is the strict way to keep Tokio free of such blocking calls. Yes, it adds complexity, but it’s a deliberate trade‑off toward the goal of “zero blocking in async tasks” to minimize UI impact on measurements.

Output writer: Routing println!/eprintln! through a channel + spawn_blocking is overkill for a CLI that prints a few lines and its default operating mode is likely not even text, but rather the TUI + json output if headless. stdout writes for small outputs aren't meaningfully blocking.

This was driven by worst‑case rather than average‑case. stdout/stderr are synchronous, and under slow downstream (pty/ssh/tmux) write() can block. For measurements that’s a jitter risk even if it’s “usually fast”. So output was moved to a dedicated writer to keep blocking I/O out of async hot paths and preserve timing stability.

Unbounded channels: Changing from bounded to unbounded channels removes backpressure which as a habit of mine unless required otherwise I generally put. Given that this app is exchanging only a few events per run I'm not sure there is any reason to make the channel unbounded.

Unbounded was a deliberate choice: the event volume is strictly bounded by durations/intervals (hundreds per run) and that’s deterministic. When the volume is small and guaranteed, unbounded doesn’t pose a practical OOM risk and is appropriate—while avoiding backpressure and send().await stalls on the hot measurement path.

Added complexity: The new modules (orchestrator, text_summary), parameter structs (LatencyProbeParams, ChartRenderParams), and restructuring add a few hundred lines and more indirection. This is adding a ton of abstraction and I'm not sure the extra complexity is warranted for a simple tool that would benefit from simplicity.

We increased modularity as a best practice to reduce coupling. Orchestration and post‑processing were separated so UI/CLI remain presentation‑only while domain logic and run lifecycle don’t depend on the interface. This lowers coupling, improves testability, and makes future changes safer (e.g., swap UI or storage/export without touching measurements). It’s not abstraction for its own sake, but deliberate separation of responsibilities for maintainability.

I'd like to understand what specific issue you observed that motivated these changes. If there was actual UI jitter or blocked workers, I'd love to see that addressed in a more targeted way. Could you please provide a controlled test where it demonstrates that there is jitter or the worker threads were actually stalled?

A controlled scenario is to run under tmux/ssh and enter tmux copy‑mode during the test. In that mode tmux stops actively reading from the PTY and stops refreshing the screen, so the output buffer fills up. Once the buffer is full, normal write() calls to stdout/stderr block, and the writing thread stalls on write(). A similar effect can happen over slow SSH links: if the remote terminal/network can’t keep up with output, the PTY buffer fills and write() starts blocking. This is reproducible and affects timing, showing that “usually fast” doesn’t guarantee non‑blocking output under slow downstream conditions.

@kavehtehrani
Copy link
Owner

Thanks for sending the pr. I did provide a sample code snippet in your other pr that would be helpful in demonstrating actual potential issues. Unfortunately this seems too theoretical and I"m guessing llm-generated which is more concerned with "correctness" rather than actual real world utility. This is almost 1000 lines of code for things that you can measure yourself and I'd be happy to take feedback if an actual problem was demonstrated, for instance terminal::draw takes microseconds and isn't blocking anything that would be noticeable by a human.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants