Skip to content

Commit 2857db7

Browse files
Merge pull request #121 from johnmarktaylor91/docs/update-results-and-test-counts
2 parents cf40112 + 75fa346 commit 2857db7

File tree

2 files changed

+65
-22
lines changed

2 files changed

+65
-22
lines changed

RESULTS.md

Lines changed: 48 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,18 @@
22

33
Public summary of TorchLens test suite outcomes. Updated after each release.
44

5-
**Last updated**: v0.15.14 · 2026-03-06 · PyTorch 2.8 · CPU + CUDA
5+
**Last updated**: v0.18.0 · 2026-03-08 · PyTorch 2.8 · CPU + CUDA
66

77
---
88

99
## Suite Overview
1010

1111
| Metric | Value |
1212
|--------|-------|
13-
| Total tests | 892 |
13+
| Total tests | 951 |
1414
| Smoke tests (`-m smoke`) | 18 |
15-
| Test files | 14 |
16-
| Example models (toy) | 249 |
15+
| Test files | 15 |
16+
| Example models (toy) | 250 |
1717
| Real-world models | 185 |
1818

1919
**Run the suite:**
@@ -30,28 +30,29 @@ pytest tests/test_profiling.py -vs # profiling report
3030

3131
| File | Tests | What it covers |
3232
|------|------:|----------------|
33-
| test_toy_models.py | 250 | API coverage on 249 example models (log, validate, visualize, metadata) |
33+
| test_toy_models.py | 258 | API coverage on 250 example models (log, validate, visualize, metadata) |
3434
| test_real_world_models.py | 185 | Real-world architectures: validation + visualization |
3535
| test_metadata.py | 107 | Field invariants, FLOPs, timing, RNG, func_call_location, corruption detection |
3636
| test_param_log.py | 70 | ParamLog, ParamAccessor, shared params, grad metadata |
3737
| test_decoration.py | 61 | Toggle state, detached imports, pause_logging, JIT compat, signal safety |
3838
| test_validation.py | 59 | Perturbation checks, metadata invariants, edge cases |
39+
| test_large_graphs.py | 51 | Large graph rendering, RandomGraphModel, ELK layout engine |
3940
| test_module_log.py | 45 | ModuleLog, ModulePassLog, ModuleAccessor, module hierarchy |
4041
| test_internals.py | 36 | Field order sync, safe_copy, internal algorithms |
4142
| test_layer_log.py | 34 | LayerLog aggregates, multi-pass delegation, loop detection |
4243
| test_save_new_activations.py | 21 | Fast re-logging, state reset, buffer handling |
4344
| test_output_aesthetics.py | 12 | Visual report generation (PDF/TeX/text) |
4445
| test_gc.py | 10 | GC correctness, memory leak detection, param ref release |
45-
| test_profiling.py | 1 | Overhead benchmarks (generates profiling_report.txt) |
46+
| test_profiling.py | 1 | Overhead benchmarks + decoration overhead (generates profiling_report.txt) |
4647
| test_arg_positions.py | 1 | ArgSpec lookup table coverage (runs last) |
4748

4849
---
4950

5051
## Model Compatibility
5152

52-
### Toy Models (249 architectures)
53+
### Toy Models (250 architectures)
5354

54-
All 249 example models in `tests/example_models.py` pass `validate_forward_pass`.
55+
All 250 example models in `tests/example_models.py` pass `validate_forward_pass`.
5556

5657
**Core patterns:** simple feedforward (incl. LeNet-5), branching, conditionals,
5758
48 loop/recurrence variants, in-place ops, view mutations, edge cases.
@@ -270,6 +271,45 @@ Overhead of `log_forward_pass` vs raw `model.forward()`. See `test_outputs/repor
270271

271272
*Overhead is dominated by per-operation bookkeeping. Large models with fewer, heavier ops (VGG16) show lower relative overhead. Small models with many lightweight ops show higher relative overhead. All measurements on CPU.*
272273

274+
### Decoration Overhead (logging disabled)
275+
276+
TorchLens permanently wraps all ~2000 PyTorch functions at import time. When logging
277+
is disabled, each wrapper is a single bool check (`if not _logging_enabled: return func(...)`).
278+
279+
| Function | Original | Decorated | Overhead |
280+
|----------|----------|-----------|----------|
281+
| torch.relu | ~8μs | ~8μs | +3–10% |
282+
| torch.add | ~10μs | ~10μs | +3–14% |
283+
| torch.cat | ~4μs | ~5μs | +10–30% |
284+
| F.linear | ~13μs | ~13μs | +3–5% |
285+
| F.conv2d | ~4.5ms | ~4.5ms | <0.2% |
286+
| F.batch_norm | ~32μs | ~33μs | ~1.5% |
287+
| F.layer_norm | ~43μs | ~43μs | <1% |
288+
| torch.matmul (512×512) | ~320μs | ~320μs | <1% |
289+
| F.scaled_dot_product_attention | ~180μs | ~180μs | ~1% |
290+
291+
*Heavy ops (conv2d, matmul, SDPA) show <1% overhead — within measurement noise. The ~600ns wrapper cost is only visible on sub-10μs elementwise ops. In practice, decoration has negligible impact on model inference speed.*
292+
293+
---
294+
295+
## Large Graph Scaling
296+
297+
TorchLens supports visualization of very large computational graphs using the ELK layout engine
298+
(auto-selected above 3,500 nodes, or via `vis_node_placement="elk"`).
299+
300+
| Scale | Nodes | Status |
301+
|-------|------:|--------|
302+
| 100 | 100 | Tested (Graphviz dot) |
303+
| 500 | 500 | Tested (Graphviz dot) |
304+
| 1,000 | 1,000 | Tested (auto-switches to ELK) |
305+
| 5,000 | 5,000 | Tested (ELK, hierarchical) |
306+
| 25,000 | 25,000 | Tested (ELK, stress + topo seeding) |
307+
| 250,000 | 250,000 | In progress |
308+
| 1,000,000 | 1,000,000 | In progress |
309+
310+
Tests in `test_large_graphs.py` cover ELK engine selection, hierarchical layout, node count
311+
validation, and rendering at multiple scales.
312+
273313
---
274314

275315
## Coverage

tests/CLAUDE.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,29 @@
11
# tests/ — Test Suite
22

33
## Overview
4-
~690 tests across 12 test files. Uses pytest with deterministic torch seeding.
4+
~951 tests across 15 test files. Uses pytest with deterministic torch seeding.
55

66
## Test Files
77

88
| File | Tests | What It Covers |
99
|------|-------|----------------|
1010
| `conftest.py` || Fixtures, deterministic seeding, output directory setup, coverage reporting |
11-
| `example_models.py` || 170+ toy model class definitions for controlled testing |
12-
| `test_toy_models.py` | ~165 | Validation + visualization for toy models (14 sections by category) |
13-
| `test_real_world_models.py` | ~87 | Real-world architectures (20 fast, 67 `@pytest.mark.slow`) |
14-
| `test_metadata.py` | ~102 | Field-level coverage for ModelLog and LayerPassLog |
15-
| `test_module_log.py` | ~44 | ModuleLog/ModulePassLog/ModuleAccessor |
16-
| `test_param_log.py` | ~68 | ParamLog/ParamAccessor |
17-
| `test_decoration.py` | ~61 | Permanent decoration architecture (toggle, crawl, JIT, signals) |
18-
| `test_validation.py` | ~50 | Validation subpackage (registries, perturbation, invariants A-R) |
19-
| `test_layer_log.py` || LayerLog aggregate class |
20-
| `test_internals.py` || Internal implementation details |
21-
| `test_save_new_activations.py` || `save_new_activations()` re-logging |
22-
| `test_profiling.py` || Performance profiling |
23-
| `test_output_aesthetics.py` | ~9 | Aesthetic report + vis PDFs for human review |
11+
| `example_models.py` || 250 toy model class definitions for controlled testing |
12+
| `test_toy_models.py` | 258 | Validation + visualization for toy models (14 sections by category) |
13+
| `test_real_world_models.py` | 185 | Real-world architectures (20 fast, 165 `@pytest.mark.slow`) |
14+
| `test_metadata.py` | 107 | Field-level coverage for ModelLog and LayerPassLog |
15+
| `test_module_log.py` | 45 | ModuleLog/ModulePassLog/ModuleAccessor |
16+
| `test_param_log.py` | 70 | ParamLog/ParamAccessor |
17+
| `test_decoration.py` | 61 | Permanent decoration architecture (toggle, crawl, JIT, signals) |
18+
| `test_validation.py` | 59 | Validation subpackage (registries, perturbation, invariants A-R) |
19+
| `test_large_graphs.py` | 51 | Large graph rendering, RandomGraphModel, ELK layout engine |
20+
| `test_layer_log.py` | 34 | LayerLog aggregate class |
21+
| `test_internals.py` | 36 | Internal implementation details |
22+
| `test_save_new_activations.py` | 21 | `save_new_activations()` re-logging |
23+
| `test_profiling.py` | 1 | Performance profiling + decoration overhead benchmarks |
24+
| `test_output_aesthetics.py` | 12 | Aesthetic report + vis PDFs for human review |
25+
| `test_gc.py` | 10 | GC correctness, memory leak detection, param ref release |
26+
| `test_arg_positions.py` | 1 | ArgSpec lookup table coverage (runs last) |
2427

2528
## Running Tests
2629

0 commit comments

Comments
 (0)