Skip to content

Releases: johnmarktaylor91/torchlens

v0.21.3

11 Mar 23:34

Choose a tag to compare

v0.21.3 (2026-03-11)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • tests: Make SIGALRM signal safety test deterministic (b3fc461)

Replace timer-based SIGALRM with direct os.kill() inside forward() so the signal always fires mid-logging. Eliminates flaky skips when the forward pass completes before the timer.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.21.2...v0.21.3

v0.21.2

09 Mar 17:53

Choose a tag to compare

v0.21.2 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • vis: Avoid graphviz.Digraph memory bomb when ELK fails on large graphs (f5563ee)

When ELK layout fails (OOM/timeout) on 1M+ node graphs, the fallback path previously built a graphviz.Digraph in Python — nested subgraph body-list copies exploded memory and hung indefinitely. Now render_elk_direct handles the failure internally: reuses already-collected Phase 1 data to generate DOT text without positions and renders directly with sfdp, bypassing graphviz.Digraph entirely.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Bypass ELK for large graphs — use Python topological layout (37cce3a)

ELK's stress algorithm allocates TWO O(n²) distance matrices (n² × 16 bytes). At 100k nodes that's 160 GB, at 1M nodes it's 16 TB — the root cause of the std::bad_alloc. The old >150k stress switch could never work.

For graphs above 100k nodes, we now skip ELK entirely and compute a topological rank layout in Python (Kahn's algorithm, O(n+m)). Module bounding boxes are computed from node positions. The result feeds into the same neato -n rendering path, preserving cluster boxes.

If ELK fails for smaller graphs, the Python layout is also used as a fallback instead of the old sfdp path that built a graphviz.Digraph (which exploded on nested subgraph body-list copies).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.21.1...v0.21.2

v0.21.1

09 Mar 15:04

Choose a tag to compare

v0.21.1 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • postprocess: Fix mypy type errors in _build_module_param_info (11ea006)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

  • postprocess: Optimize pipeline for large models (a211417)

  • Per-step verbose timing: unwrap grouped _vtimed blocks into individual step timing with graph-stats summary, enabling users to identify which specific step is slow (O16) - Cache module_str by containing_modules tuple to avoid redundant string joins in Step 6 (O8) - Early-continue guards in _undecorate_all_saved_tensors to skip BFS on layers with empty captured_args/kwargs (O5) - Pre-compute buffer_layers_by_module dict in _build_module_logs, eliminating O(modules × buffers) scan per module (O6) - Single-pass arglist rebuild in Step 11 rename, replacing 3-pass enumerate + index set + filter pattern (O2) - Replace OrderedDict with dict in _trim_and_reorder (Python 3.7+ preserves insertion order) for lower allocation overhead (O4) - Reverse-index approach in _refine_iso_groups: O(members × neighbors) instead of O(members²) all-pairs combinations (O9) - Pre-compute param types per subgraph as frozenset before pair loop in _merge_iso_groups_to_layers (O10) - Set-based O(n) collision
    detection replacing O(n²) .count() calls in _find_isomorphic_matches (O12)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.21.0...v0.21.1

v0.21.0

09 Mar 12:04

Choose a tag to compare

v0.21.0 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • capture: Fix mypy type errors in output_tensors field dict (d54e9a9)

Annotate fields_dict as Dict[str, Any] and extract param_shapes with proper type to satisfy mypy strict inference.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Pass heap limits to ELK Worker thread to prevent OOM on 1M nodes (23ef8d8)

The Node.js Worker running ELK layout had no explicit maxOldGenerationSizeMb in its resourceLimits — only stackSizeMb was set. The --max-old-space-size flag controls the main thread's V8 isolate, not the Worker's. This caused the Worker to OOM at ~16GB on 1M-node graphs despite the main thread being configured for up to 64GB.

  • Add maxOldGenerationSizeMb and maxYoungGenerationSizeMb to Worker
    resourceLimits, passed via _TL_HEAP_MB env var
  • Add _available_memory_mb() to detect system RAM and cap heap allocation
    to (available - 4GB), preventing competition with Python process
  • Include available system memory in OOM diagnostic messages

Also includes field/param renames from feat/grand-rename branch.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Update all CLAUDE.md files with deepdive session 4 findings (b15c5bf)

Sync all project and subpackage documentation with current codebase:

  • Updated line counts across all 36 modules
  • Added elk_layout.py documentation to visualization/
  • Added arg_positions.py and salient_args.py to capture/
  • Documented 13 new bugs (ELK-IF-THEN, BFLOAT16-TOL, etc.)
  • Updated test counts (1,004 tests across 16 files)
  • Added known bugs sections to validation/, utils/, decoration/
  • Updated data_classes/ with new fields and properties

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • Rename all data structure fields and function args for clarity (f0d7452)

Rename ~68 fields across all 8 data structures (ModelLog, LayerPassLog, LayerLog, ParamLog, ModuleLog, BufferLog, ModulePassLog, FuncCallLocation) plus user-facing function arguments. Key changes:

  • tensor_contents → activation, grad_contents → gradient
  • All _fsize_memory (e.g. tensor_fsize → tensor_memory)
  • func_applied_name → func_name, gradfunc → grad_fn_name
  • is_bottom_level_submodule_output → is_leaf_module_output
  • containing_module_origin → containing_module
  • spouse_layers → co_parent_layers, orig_ancestors → root_ancestors
  • model_is_recurrent → is_recurrent, elapsed_time_* → time_*
  • vis_opt → vis_mode, save_only → vis_save_only
  • Fix typo: output_descendents → output_descendants

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.20.5...v0.21.0

v0.20.5

09 Mar 02:49

Choose a tag to compare

v0.20.5 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • vis: Prevent OOM kill on 1M-node ELK render (#128, d9a1525)

The 1M-node render was OOM-killed at ~74GB RSS because: 1. Model params (~8-10GB) stayed alive during ELK subprocess 2. preexec_fn forced fork+exec, COW-doubling the 74GB process 3. Heap/stack formulas produced absurd values (5.6TB heap, 15GB stack) 4. No memory cleanup before subprocess launch

Changes:

  • render_large_graph.py: separate log_forward_pass from render_graph,
    free model/autograd before ELK render
  • elk_layout.py: cap heap at 64GB, stack floor 4096MB/cap 8192MB,
    write JSON to temp file (free string before subprocess), gc.collect
    before subprocess, set RLIMIT_STACK at module level (removes
    preexec_fn and the forced fork+exec)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.20.4...v0.20.5

v0.20.4

09 Mar 01:03

Choose a tag to compare

v0.20.4 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • postprocess: Backward-only flood in conditional branch detection + THEN labeling (#88, d737828)

Bug #88: _mark_conditional_branches flooded bidirectionally (parents + children), causing non-conditional nodes' children to be falsely marked as in_cond_branch. Fix restricts flooding to parent_layers only.

Additionally adds THEN branch detection via AST analysis when save_source_context=True, with IF/THEN edge labels in visualization. Includes 8 new test models, 22 new tests, and fixes missing 'verbose' in MODEL_LOG_FIELD_ORDER.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Use Worker thread for ELK layout to fix stack overflow on large graphs (3fe6a84)

V8's --stack-size flag silently caps at values well below what's requested, causing "Maximum call stack size exceeded" on 1M+ node graphs. Switch to Node.js Worker threads with resourceLimits.stackSizeMb, which reliably delivers the requested stack size at the V8 isolate level.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.20.3...v0.20.4

v0.20.3

08 Mar 23:49

Choose a tag to compare

v0.20.3 (2026-03-08)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • vis: Increase ELK Node.js stack floor to 4GB for large graphs (29af94e)

128MB was insufficient for ELK's recursive layout on 500k+ node graphs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Raise OS stack limit for ELK Node.js subprocess (da82c9d)

The OS soft stack limit (ulimit -s) was smaller than the --stack-size value passed to Node.js, causing a segfault on large graphs (500k+ nodes) instead of allowing V8 to use the requested stack. Uses preexec_fn to set RLIMIT_STACK to unlimited in the child process only.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

  • decoration: Optimize model prep and move session attrs to ModelLog dicts (b63a4fa)

Five performance fixes for _prepare_model_session and related setup code:

  • PERF-38: Replace O(N²) list concat in _traverse_model_modules with deque - PERF-37: Cache user_methods per class in _get_class_metadata; move _pytorch_internal set to module-level frozenset - PERF-36: Iterate module._parameters directly instead of rsplit on named_parameters addresses + lookup dict - PERF-39: Skip patch_model_instance for already-prepared models - Move 4 session-scoped module attrs (tl_module_pass_num, tl_module_pass_labels, tl_tensors_entered_labels, tl_tensors_exited_labels) from nn.Module instances to ModelLog dicts keyed by id(module). Remove tl_source_model_log (dead code). Eliminates per-module cleanup iteration in _cleanup_model_session.

At 10K modules: ensure_prepared repeat calls drop from ~48ms to ~0.4ms (111x), session setup ~1.3x faster, cleanup ~1.4x faster.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.20.2...v0.20.3

v0.20.2

08 Mar 20:26

Choose a tag to compare

v0.20.2 (2026-03-08)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • vis: Increase ELK Node.js stack size to prevent overflow (b8edbc8)

Bump --stack-size floor from 64MB to 128MB and multiplier from 16x to 48x (matching heap scaling) to prevent "Maximum call stack size exceeded" in elkjs on large graphs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • scripts: Enable loop detection in render_large_graph (803e16f)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.20.1...v0.20.2

v0.20.1

08 Mar 19:42

Choose a tag to compare

v0.20.1 (2026-03-08)

This release is published under the GPL-3.0-only License.

Chores

  • scripts: Use log_forward_pass vis_opt instead of separate render call (d2aea0f)

Let verbose mode handle all phase timing instead of manual timestamps. Use log_forward_pass's built-in vis_opt to render in one call.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

  • model_prep: Optimize _prepare_model_session for large models (2892323)

  • Hoist set(dir(nn.Module)) to module-level constant _NN_MODULE_ATTRS

  • Replace dir(module) MRO walk with dict scans for attrs and methods

  • Pre-build address→module dict to eliminate per-parameter tree walks

  • Use model.modules() with cached tl_module_address instead of second DFS

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.20.0...v0.20.1

v0.20.0

08 Mar 19:01

Choose a tag to compare

v0.20.0 (2026-03-08)

This release is published under the GPL-3.0-only License.

Chores

  • scripts: Unify large graph render scripts into single parameterized script (07a8186)

Replace run_250k.py and run_1M.py with render_large_graph.py that accepts any node count as a CLI argument, plus --format, --seed, and
--outdir options.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • logging: Add verbose mode for timed progress messages (0603f10)

Add verbose: bool = False parameter to log_forward_pass, show_model_graph, and internal pipeline functions. When enabled, prints [torchlens]-prefixed progress at each major pipeline stage with timing. Also fixes _trim_and_reorder_model_history_fields to preserve all non-ordered attributes (not just private ones).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.19.0...v0.20.0