Date: January 2026 Version: 0.1.1 Issue: #48 - Profile sequential mode to identify bottlenecks
Sequential mode performance analysis reveals that subprocess overhead dominates execution time, accounting for approximately 90% of total runtime. The primary bottleneck is not the mutation switching architecture itself, but the cost of spawning separate pytest processes for each gremlin test.
| Metric | Value | Impact |
|---|---|---|
| Subprocess wait time | 90% of total | Critical |
| Coverage collection | 8-10% of total | Medium |
| AST transformation | <0.5% of total | Low |
| Test selection | <0.1% of total | Negligible |
The original claim of "0.8x mutmut speed" (23% slower) was based on incomplete measurements. Proper profiling reveals:
- On small targets (16 gremlins, 15 tests): gremlins takes ~20s vs mutmut ~12s = 1.7x slower
- The gap is entirely due to subprocess overhead, not mutation switching
- Hardware: Apple Silicon (M-series)
- Python: 3.14.0
- pytest: 9.0.1
- OS: Darwin (macOS)
| Metric | Value |
|---|---|
| Duration | 25.6ms |
| Files discovered | 35 |
| Total lines | 4,979 |
| % of total | 0.03% |
Assessment: Negligible overhead. File system operations are fast.
| Metric | Value |
|---|---|
| Duration | 96.7ms |
| Gremlins generated | 435 |
| Avg parse time | 0.53ms/file |
| Avg transform time | 2.23ms/file |
| % of total | 0.1% |
Assessment: Extremely efficient. The mutation switching architecture parses and transforms all source files in under 100ms, generating 435 gremlins from 35 files.
| Metric | Value |
|---|---|
| Duration | 28.7ms |
| Avg unparse time | 0.82ms/file |
| Output size | 183KB |
| % of total | 0.03% |
Assessment: Negligible. Python's ast.unparse() is fast.
| Metric | Value |
|---|---|
| Duration | 1,183ms |
| Tests discovered | 417 |
| % of total | 1.2% |
Assessment: Acceptable. This is standard pytest collection overhead.
| Metric | Value |
|---|---|
| Duration | 91,796ms (91.8s) |
| Coverage run time | 91,785ms |
| Contexts collected | 390 |
| Covered files | 82 |
| % of total | 91.4% |
Assessment: This is the first major bottleneck when considering the full test suite. Coverage
collection runs the entire test suite with coverage.py's dynamic context feature to map lines
to tests. For our full test suite (417 tests), this takes 91 seconds.
Note: For smaller test subsets, this phase is proportionally smaller. The 91s measurement was against the full test suite.
| Metric | Value |
|---|---|
| Sample size | 5 gremlins |
| Avg subprocess time | 1,460ms |
| Min subprocess time | 1,187ms |
| Max subprocess time | 1,792ms |
| Subprocess overhead | ~500-700ms per call |
Assessment: Each gremlin test requires spawning a new Python subprocess, which incurs:
- Python interpreter startup (~300ms)
- pytest initialization (~200ms)
- Module imports (~200ms)
- Actual test execution (variable)
For 16 gremlins running 15 tests each, this means 16 subprocess spawns.
Detailed function-level profiling on a single mutation test run (16 gremlins, 15 tests):
| Function | Cumtime | Calls | Description |
|---|---|---|---|
select.poll |
22.41s | 80 | Waiting for subprocess I/O |
subprocess.run |
22.58s | 16 | Process spawning |
_run_mutation_testing |
20.59s | 1 | Main mutation loop |
_test_gremlin |
20.58s | 15 | Per-gremlin test execution |
_collect_coverage |
2.03s | 1 | Coverage data collection |
pytest_sessionfinish (22.6s)
├── _collect_coverage (2.0s)
│ └── subprocess.run (2.0s) - coverage collection
└── _run_mutation_testing (20.6s)
└── _test_gremlin (20.6s, called 15x)
└── subprocess.run (20.6s, ~1.4s each)
└── select.poll (waiting for tests)
90% of execution time is spent in select.poll - literally waiting for subprocess I/O. The
Python interpreter and pytest spend almost no time on actual computation; they're waiting for
child processes.
-
Subprocess Spawning Overhead (Critical)
- Each gremlin test spawns a new Python process
- ~1.4 seconds per subprocess (500-700ms is pure overhead)
- For N gremlins: N * 1.4s minimum
- Solution: Batch testing, persistent workers, or in-process execution
-
Coverage Collection (Medium)
- Must run full test suite once with coverage
- Scales with test suite size
- Solution: Reuse existing coverage data, parallel collection, or sampling
-
Test Discovery (Low)
- Standard pytest overhead
- ~1.2s one-time cost
- Solution: Cache test collection
-
AST Transformation (Negligible)
- <100ms for 35 files, 435 gremlins
- No optimization needed
Direct comparison was not possible due to mutmut 3.x compatibility issues with Python 3.14 (multiprocessing context error). However, based on architecture analysis:
- Uses "trampoline" injection at module import time
- Runs tests in the same process (no subprocess per mutation)
- Uses hash-based caching aggressively
- Spawns subprocess for each gremlin test
- Provides isolation but at high cost
- Clean implementation but slower
mutmut avoids subprocess overhead by running mutations in-process. gremlins' subprocess isolation is safer but orders of magnitude slower for sequential execution.
- Run tests in the same process as gremlins
- Toggle
__gremlin_active__between test runs - Risk: Test pollution, but mutation switching is designed for this
- Expected speedup: 10-50x
- Keep subprocess pool warm
- Workers import modules once, run multiple gremlins
- Pass gremlin ID via IPC, not environment variable
- Expected speedup: 5-10x
- Run multiple gremlins per subprocess
- Reduces spawn overhead but increases per-subprocess time
- Expected speedup: 2-5x
- Reuse pytest-cov data: If tests already ran with coverage, use that
- Parallel coverage: Run coverage collection with pytest-xdist
- Sampling: For large test suites, sample coverage data
- Expected speedup: 2-5x on coverage phase
- Cache pytest's test collection between runs
- Minor optimization (~1s savings)
- Expected speedup: Negligible overall impact
Based on this analysis, achievable targets:
| Scenario | Current | Target | Speedup |
|---|---|---|---|
| 16 gremlins, 15 tests | 20s | 2-4s | 5-10x |
| 435 gremlins, 417 tests | ~10min | 30-60s | 10-20x |
| Incremental (cached) | N/A | <5s | N/A |
/docs/performance/profiling_data.json- Raw timing data/docs/performance/profile_stats.txt- cProfile output/tmp/gremlins_profile.pstats- Binary profile data (for snakeviz)
The performance gap between pytest-gremlins and mutmut is not due to the mutation switching architecture, which is highly efficient (<100ms for full transformation). The bottleneck is subprocess isolation - spawning a new Python process for each gremlin test.
To achieve competitive or superior performance, pytest-gremlins must either:
- Move to in-process execution (matching mutmut's approach)
- Implement a persistent worker pool (maintaining isolation with reduced overhead)
- Use parallel execution aggressively (already partially implemented)
The parallel execution mode (#30) partially addresses this by running multiple subprocesses concurrently, but doesn't reduce the per-subprocess overhead. True speedup requires architectural changes to the test execution model.