Skip to content

Commit 021e9ee

Browse files
emerybergerclaudegithub-advanced-security[bot]
authored
Add async-aware profiling (#1006)
* Add async-aware profiling with --async flag Adds await-time attribution so users can see how much wall-clock time each `await` line spends waiting, displayed as pie charts in the GUI and a new column in CLI output. Pies rotate clockwise so each row's wedge starts where the previous row's ended. Also improves GPU pies to use two-wedge (filled + remaining) rendering for consistency. New modules: scalene_async.py (coroutine tracking, dual-strategy instrumentation for 3.9+ polling and 3.12+ sys.monitoring), replacement_asyncio.py (event loop shim). Extends statistics, JSON output, CLI viewer, and GUI with async await data, concurrency metrics, and task name attribution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove unused type: ignore comment to fix mypy linter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix ruff linter: use lowercase type annotations, remove quotes Replace typing.Dict/List/Set/Tuple/Optional with builtins (safe with from __future__ import annotations). Remove quoted forward reference for RunningStats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add async profiling demo script Example program for testing --async profiling with fast I/O, slow I/O, CPU-bound, and mixed async workloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Cap async data structures to prevent unbounded memory growth - Cap async_task_names sets at 100 entries per (file, line) to prevent unbounded growth when programs create many uniquely-named tasks - Cap _suspended_tasks dict at 10000 entries; clear when exceeded to prune stale entries from tasks that yielded but never resumed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Potential fix for code scanning alert no. 913: Empty except Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 914: Empty except Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Add developer guide docs for debugging and GUI patterns Link new Scalene-Debugging.md and Scalene-GUI.md from CLAUDE.md, documenting signal handler gotchas, async profiling debugging, profile output pipeline, unbounded growth prevention, and GUI column/chart patterns learned during async profiling work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Enable async profiling by default Async profiling has near-zero overhead for non-async code (one frozenset lookup per signal, sys.monitoring callbacks never fire without coroutines) and provides essential data for async code. Users can disable with --no-async if needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
1 parent a29af53 commit 021e9ee

16 files changed

+1647
-915
lines changed

CLAUDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -608,3 +608,11 @@ This breaks `std::vsnprintf` in `<string>` and other headers. **Fix:** Include C
608608
## Profiling Guide
609609

610610
See [Scalene-Agents.md](Scalene-Agents.md) for detailed information about interpreting Scalene's profiling output, including Python vs C time, memory metrics, and optimization strategies.
611+
612+
## Debugging Guide
613+
614+
See [Scalene-Debugging.md](Scalene-Debugging.md) for signal handler debugging, async profiling debugging, the profile output pipeline (three separate renderers!), and unbounded growth prevention patterns.
615+
616+
## GUI Development Guide
617+
618+
See [Scalene-GUI.md](Scalene-GUI.md) for adding new columns, Vega-Lite chart types, pie chart best practices (two-wedge rendering, rotating pies), and the chart rendering flow.

Scalene-Debugging.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Debugging Patterns for Scalene
2+
3+
## Signal Handler Debugging
4+
5+
The CPU signal handler (`cpu_signal_handler`) receives `this_frame` as the raw frame parameter. The `compute_frames_to_record()` function filters this down to user-only frames via `_should_trace()`.
6+
7+
**Critical gotcha**: When the main thread is idle in the event loop (the exact case async profiling needs to detect), there are NO user frames — asyncio/selector frames are all filtered out. So `frames` from `compute_frames_to_record()` is empty. Must use `this_frame` directly for event loop detection.
8+
9+
## Async Profiling Debugging
10+
11+
To verify async profiling is working:
12+
1. Create a test program with known behavior (fast I/O, slow I/O, CPU-bound)
13+
2. Profile: `python3 -m scalene run --async test/test_async_demo.py`
14+
3. Check JSON: `python3 -m scalene view --cli` should show Await % column
15+
4. Verify proportions: slow_io >> mixed_work >> fast_io, cpu_work = 0% await
16+
17+
To debug zero-await-data:
18+
- Check `is_in_event_loop()` returns True when event loop is idle
19+
- Check `_poll_suspended_tasks()` finds tasks
20+
- Verify the signal handler is using `this_frame`, not filtered `frames`
21+
22+
## Profile Output Pipeline
23+
24+
There are **three separate renderers** for profile output. All must be updated when adding new columns:
25+
26+
- **JSON output**: `scalene_json.py:output_profiles()``output_profile_line()`
27+
- **CLI viewer**: `scalene_parseargs.py:_display_profile_cli()` — used by `scalene view --cli`
28+
- **HTML/GUI output**: `scalene_output.py:output_profiles()` — used by `scalene view --html`
29+
- **GUI (browser)**: `scalene-gui.ts:makeProfileLine()` → embedded Vega-Lite charts via `vegaEmbed()`
30+
- **Standalone HTML**: `scalene_utility.py:generate_html(standalone=True)` embeds all assets inline
31+
32+
Note: `_display_profile_cli()` in `scalene_parseargs.py` is completely separate from `scalene_output.py`. This is easy to miss.
33+
34+
## Unbounded Growth Prevention
35+
36+
Any dict or set that accumulates per-sample data must be bounded:
37+
38+
- **Dicts keyed by (filename, lineno)**: Inherently bounded by source code size — OK.
39+
- **Sets of names/strings**: Must be capped (e.g., `async_task_names` capped at 100 per location).
40+
- **Tracking dicts** (e.g., `_suspended_tasks`): Must be capped and cleared when exceeded.
41+
- **`RunningStats`**: Fixed-size (count, mean, M2) — OK.
42+
- **`ScaleneSigQueue`**: Uses `SimpleQueue` with continuous consumer drain — OK.

Scalene-GUI.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# GUI Development Patterns
2+
3+
## Adding a New Column
4+
5+
1. **`gui-elements.ts`**: Add chart function (e.g., `makeAwaitPie`) following existing patterns
6+
2. **`scalene-gui.ts`**:
7+
- Add to imports
8+
- Add chart array (e.g., `const await_pies: (unknown | null)[] = []`)
9+
- Add column in `makeTableHeader()`
10+
- Add cell rendering in `makeProfileLine()` — push chart specs to array
11+
- Pass array through both call sites (line profile loop + function profile loop)
12+
- Add `embedCharts(await_pies, "await_pie")` at the end
13+
3. **Rebuild**: `npx esbuild scalene-gui.ts --bundle --outfile=scalene-gui-bundle.js --format=iife --global-name=ScaleneGUI`
14+
4. **`scalene_json.py`**: Add field to `FunctionDetail`, compute in payload
15+
5. **`scalene_output.py`**: Add column for CLI `--html` output
16+
6. **`scalene_parseargs.py`**: Add column in `_display_profile_cli()` for `scalene view --cli`
17+
18+
## Chart Types (Vega-Lite)
19+
20+
All charts are Vega-Lite specs rendered via `vegaEmbed()` after DOM insertion.
21+
22+
- **Bar**: `makeBar()` — stacked horizontal bar (CPU time: Python/native/system)
23+
- **Pie**: `makeGPUPie()`, `makeAwaitPie()`, `makeMemoryPie()` — arc charts
24+
- **Sparkline**: `makeSparkline()` — line chart for memory timeline
25+
- **NRT/NC bars**: `makeNRTBar()`, `makeNCTimeBar()` — Neuron time bars
26+
- **Standard dimensions**: 20px height, various widths
27+
28+
## Pie Chart Best Practices
29+
30+
- Always use **two data values** (filled + remaining) for a complete circle. Single-value pies with `scale: { domain: [0, 100] }` show partial arcs with gaps — looks bad.
31+
- For **rotating pies** (each row's wedge starts where previous ended): use `scale: { range: [startAngle, startAngle + 2*PI] }` on the theta encoding. Track cumulative angle:
32+
```typescript
33+
pieAngles.await += (pct / 100) * 2 * Math.PI;
34+
```
35+
- Reset angle state per table (line profile and function profile tables get separate `pieAngles` objects).
36+
37+
## Chart Rendering Flow
38+
39+
1. `makeProfileLine()` builds HTML string with `<span id="chart_name${index}">` placeholders
40+
2. Chart specs are pushed to arrays (e.g., `cpu_bars`, `gpu_pies`, `await_pies`)
41+
3. After all HTML is inserted into DOM, `embedCharts(array, "prefix")` calls `vegaEmbed()` for each spec
42+
4. SVGs render asynchronously — Selenium tests need explicit waits to verify SVG content
43+
44+
## makeProfileLine Call Sites
45+
46+
This function has many parameters. When adding new ones, append to the end with defaults. The two call sites are:
47+
- Line profile loop: creates `linePieAngles = { await: 0, gpu: 0 }` before the loop
48+
- Function profile loop: creates `fnPieAngles = { await: 0, gpu: 0 }` before the loop

scalene/replacement_asyncio.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""Scalene replacement for asyncio event loop instrumentation.
2+
3+
Follows the existing replacement_*.py pattern using @Scalene.shim.
4+
When async profiling is enabled, this module activates the
5+
ScaleneAsync instrumentation (sys.monitoring on 3.12+, polling on older).
6+
"""
7+
8+
from scalene.scalene_profiler import Scalene
9+
10+
11+
@Scalene.shim
12+
def replacement_asyncio(scalene: Scalene) -> None:
13+
"""Activate async profiling instrumentation when --async is enabled.
14+
15+
This is called during profiler initialization. The actual enable/disable
16+
is controlled by Scalene.__init__ based on the --async flag.
17+
ScaleneAsync.enable() installs sys.monitoring callbacks on 3.12+.
18+
On 3.9-3.11, polling via asyncio.all_tasks() is used instead,
19+
triggered from the signal queue processor.
20+
"""
21+
# Nothing to do here - activation is handled by the profiler
22+
# based on the --async flag. This module exists as a placeholder
23+
# following the replacement_*.py convention, and can be extended
24+
# with additional event loop wrapping if needed.
25+
pass

scalene/scalene-gui/gui-elements.ts

Lines changed: 71 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -186,10 +186,66 @@ export function makeBar(
186186
};
187187
}
188188

189+
export function makeAwaitPie(
190+
await_pct: number,
191+
params: ChartParams,
192+
startAngle: number = 0
193+
): object {
194+
return {
195+
$schema: "https://vega.github.io/schema/vega-lite/v5.json",
196+
config: {
197+
view: {
198+
stroke: "transparent",
199+
},
200+
},
201+
autosize: {
202+
contains: "padding",
203+
},
204+
width: params.width,
205+
height: params.height,
206+
padding: 0,
207+
data: {
208+
values: [
209+
{
210+
category: "await",
211+
value: await_pct.toFixed(1),
212+
c: "await: " + await_pct.toFixed(1) + "%",
213+
},
214+
{
215+
category: "other",
216+
value: (100 - await_pct).toFixed(1),
217+
c: "",
218+
},
219+
],
220+
},
221+
mark: "arc",
222+
encoding: {
223+
theta: {
224+
field: "value",
225+
type: "quantitative",
226+
scale: {
227+
range: [startAngle, startAngle + 2 * Math.PI],
228+
},
229+
},
230+
color: {
231+
field: "category",
232+
type: "nominal",
233+
legend: false,
234+
scale: {
235+
domain: ["await", "other"],
236+
range: ["darkcyan", "#e0f2f1"],
237+
},
238+
},
239+
tooltip: [{ field: "c", type: "nominal", title: "await" }],
240+
},
241+
};
242+
}
243+
189244
export function makeGPUPie(
190245
util: number,
191246
gpu_device: string,
192-
params: ChartParams
247+
params: ChartParams,
248+
startAngle: number = 0
193249
): object {
194250
return {
195251
$schema: "https://vega.github.io/schema/vega-lite/v5.json",
@@ -207,24 +263,34 @@ export function makeGPUPie(
207263
data: {
208264
values: [
209265
{
210-
category: 1,
266+
category: "in use",
211267
value: util.toFixed(1),
212268
c: "in use: " + util.toFixed(1) + "%",
213269
},
270+
{
271+
category: "idle",
272+
value: (100 - util).toFixed(1),
273+
c: "",
274+
},
214275
],
215276
},
216277
mark: "arc",
217278
encoding: {
218279
theta: {
219280
field: "value",
220281
type: "quantitative",
221-
scale: { domain: [0, 100] },
282+
scale: {
283+
range: [startAngle, startAngle + 2 * Math.PI],
284+
},
222285
},
223286
color: {
224-
field: "c",
287+
field: "category",
225288
type: "nominal",
226289
legend: false,
227-
scale: { range: ["goldenrod", "#f4e6c2"] },
290+
scale: {
291+
domain: ["in use", "idle"],
292+
range: ["goldenrod", "#f4e6c2"],
293+
},
228294
},
229295
tooltip: [{ field: "c", type: "nominal", title: gpu_device }],
230296
},

0 commit comments

Comments
 (0)