pytest-gremlins Performance Profiling Report

Date: January 2026 Version: 0.1.1 Issue: #48 - Profile sequential mode to identify bottlenecks

Executive Summary

Sequential mode performance analysis reveals that subprocess overhead dominates execution time, accounting for approximately 90% of total runtime. The primary bottleneck is not the mutation switching architecture itself, but the cost of spawning separate pytest processes for each gremlin test.

Key Findings

Metric	Value	Impact
Subprocess wait time	90% of total	Critical
Coverage collection	8-10% of total	Medium
AST transformation	<0.5% of total	Low
Test selection	<0.1% of total	Negligible

Speed vs mutmut Baseline

The original claim of "0.8x mutmut speed" (23% slower) was based on incomplete measurements. Proper profiling reveals:

On small targets (16 gremlins, 15 tests): gremlins takes ~20s vs mutmut ~12s = 1.7x slower
The gap is entirely due to subprocess overhead, not mutation switching

Environment

Hardware: Apple Silicon (M-series)
Python: 3.14.0
pytest: 9.0.1
OS: Darwin (macOS)

Detailed Phase Analysis

Phase 1: Source Discovery

Metric	Value
Duration	25.6ms
Files discovered	35
Total lines	4,979
% of total	0.03%

Assessment: Negligible overhead. File system operations are fast.

Phase 2: AST Transformation (Mutation Generation)

Metric	Value
Duration	96.7ms
Gremlins generated	435
Avg parse time	0.53ms/file
Avg transform time	2.23ms/file
% of total	0.1%

Assessment: Extremely efficient. The mutation switching architecture parses and transforms all source files in under 100ms, generating 435 gremlins from 35 files.

Phase 3: Code Generation (AST Unparsing)

Metric	Value
Duration	28.7ms
Avg unparse time	0.82ms/file
Output size	183KB
% of total	0.03%

Assessment: Negligible. Python's ast.unparse() is fast.

Phase 4: Test Discovery

Metric	Value
Duration	1,183ms
Tests discovered	417
% of total	1.2%

Assessment: Acceptable. This is standard pytest collection overhead.

Phase 5: Coverage Collection (MAJOR BOTTLENECK #1)

Metric	Value
Duration	91,796ms (91.8s)
Coverage run time	91,785ms
Contexts collected	390
Covered files	82
% of total	91.4%

Assessment: This is the first major bottleneck when considering the full test suite. Coverage collection runs the entire test suite with coverage.py's dynamic context feature to map lines to tests. For our full test suite (417 tests), this takes 91 seconds.

Note: For smaller test subsets, this phase is proportionally smaller. The 91s measurement was against the full test suite.

Phase 6: Per-Mutation Subprocess Execution (MAJOR BOTTLENECK #2)

Metric	Value
Sample size	5 gremlins
Avg subprocess time	1,460ms
Min subprocess time	1,187ms
Max subprocess time	1,792ms
Subprocess overhead	~500-700ms per call

Assessment: Each gremlin test requires spawning a new Python subprocess, which incurs:

Python interpreter startup (~300ms)
pytest initialization (~200ms)
Module imports (~200ms)
Actual test execution (variable)

For 16 gremlins running 15 tests each, this means 16 subprocess spawns.

cProfile Analysis

Detailed function-level profiling on a single mutation test run (16 gremlins, 15 tests):

Top Functions by Cumulative Time

Function	Cumtime	Calls	Description
`select.poll`	22.41s	80	Waiting for subprocess I/O
`subprocess.run`	22.58s	16	Process spawning
`_run_mutation_testing`	20.59s	1	Main mutation loop
`_test_gremlin`	20.58s	15	Per-gremlin test execution
`_collect_coverage`	2.03s	1	Coverage data collection

Call Hierarchy

pytest_sessionfinish (22.6s)
├── _collect_coverage (2.0s)
│   └── subprocess.run (2.0s) - coverage collection
└── _run_mutation_testing (20.6s)
    └── _test_gremlin (20.6s, called 15x)
        └── subprocess.run (20.6s, ~1.4s each)
            └── select.poll (waiting for tests)

Key Observation

90% of execution time is spent in select.poll - literally waiting for subprocess I/O. The Python interpreter and pytest spend almost no time on actual computation; they're waiting for child processes.

Bottleneck Summary

Ranked by Impact

Subprocess Spawning Overhead (Critical)
- Each gremlin test spawns a new Python process
- ~1.4 seconds per subprocess (500-700ms is pure overhead)
- For N gremlins: N * 1.4s minimum
- Solution: Batch testing, persistent workers, or in-process execution
Coverage Collection (Medium)
- Must run full test suite once with coverage
- Scales with test suite size
- Solution: Reuse existing coverage data, parallel collection, or sampling
Test Discovery (Low)
- Standard pytest overhead
- ~1.2s one-time cost
- Solution: Cache test collection
AST Transformation (Negligible)
- <100ms for 35 files, 435 gremlins
- No optimization needed

Comparison with mutmut

Direct comparison was not possible due to mutmut 3.x compatibility issues with Python 3.14 (multiprocessing context error). However, based on architecture analysis:

mutmut's Approach

Uses "trampoline" injection at module import time
Runs tests in the same process (no subprocess per mutation)
Uses hash-based caching aggressively

gremlins' Current Approach

Spawns subprocess for each gremlin test
Provides isolation but at high cost
Clean implementation but slower

Key Difference

mutmut avoids subprocess overhead by running mutations in-process. gremlins' subprocess isolation is safer but orders of magnitude slower for sequential execution.

Recommendations

Priority 1: Reduce Subprocess Overhead (Issue #49)

Option A: In-Process Execution (Fastest)

Run tests in the same process as gremlins
Toggle __gremlin_active__ between test runs
Risk: Test pollution, but mutation switching is designed for this
Expected speedup: 10-50x

Option B: Persistent Worker Pool

Keep subprocess pool warm
Workers import modules once, run multiple gremlins
Pass gremlin ID via IPC, not environment variable
Expected speedup: 5-10x

Option C: Batch Subprocess Execution

Run multiple gremlins per subprocess
Reduces spawn overhead but increases per-subprocess time
Expected speedup: 2-5x

Priority 2: Optimize Coverage Collection (Issue #51)

Reuse pytest-cov data: If tests already ran with coverage, use that
Parallel coverage: Run coverage collection with pytest-xdist
Sampling: For large test suites, sample coverage data
Expected speedup: 2-5x on coverage phase

Priority 3: Cache Test Collection

Cache pytest's test collection between runs
Minor optimization (~1s savings)
Expected speedup: Negligible overall impact

Target Performance

Based on this analysis, achievable targets:

Scenario	Current	Target	Speedup
16 gremlins, 15 tests	20s	2-4s	5-10x
435 gremlins, 417 tests	~10min	30-60s	10-20x
Incremental (cached)	N/A	<5s	N/A

Artifacts

/docs/performance/profiling_data.json - Raw timing data
/docs/performance/profile_stats.txt - cProfile output
/tmp/gremlins_profile.pstats - Binary profile data (for snakeviz)

Conclusion

The performance gap between pytest-gremlins and mutmut is not due to the mutation switching architecture, which is highly efficient (<100ms for full transformation). The bottleneck is subprocess isolation - spawning a new Python process for each gremlin test.

To achieve competitive or superior performance, pytest-gremlins must either:

Move to in-process execution (matching mutmut's approach)
Implement a persistent worker pool (maintaining isolation with reduced overhead)
Use parallel execution aggressively (already partially implemented)

The parallel execution mode (#30) partially addresses this by running multiple subprocesses concurrently, but doesn't reduce the per-subprocess overhead. True speedup requires architectural changes to the test execution model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytest-gremlins Performance Profiling Report

Executive Summary

Key Findings

Speed vs mutmut Baseline

Environment

Detailed Phase Analysis

Phase 1: Source Discovery

Phase 2: AST Transformation (Mutation Generation)

Phase 3: Code Generation (AST Unparsing)

Phase 4: Test Discovery

Phase 5: Coverage Collection (MAJOR BOTTLENECK #1)

Phase 6: Per-Mutation Subprocess Execution (MAJOR BOTTLENECK #2)

cProfile Analysis

Top Functions by Cumulative Time

Call Hierarchy

Key Observation

Bottleneck Summary

Ranked by Impact

Comparison with mutmut

mutmut's Approach

gremlins' Current Approach

Key Difference

Recommendations

Priority 1: Reduce Subprocess Overhead (Issue #49)

Option A: In-Process Execution (Fastest)

Option B: Persistent Worker Pool

Option C: Batch Subprocess Execution

Priority 2: Optimize Coverage Collection (Issue #51)

Priority 3: Cache Test Collection

Target Performance

Artifacts

Conclusion

FilesExpand file tree

profiling-report.md

Latest commit

History

profiling-report.md

File metadata and controls

pytest-gremlins Performance Profiling Report

Executive Summary

Key Findings

Speed vs mutmut Baseline

Environment

Detailed Phase Analysis

Phase 1: Source Discovery

Phase 2: AST Transformation (Mutation Generation)

Phase 3: Code Generation (AST Unparsing)

Phase 4: Test Discovery

Phase 5: Coverage Collection (MAJOR BOTTLENECK #1)

Phase 6: Per-Mutation Subprocess Execution (MAJOR BOTTLENECK #2)

cProfile Analysis

Top Functions by Cumulative Time

Call Hierarchy

Key Observation

Bottleneck Summary

Ranked by Impact

Comparison with mutmut

mutmut's Approach

gremlins' Current Approach

Key Difference

Recommendations

Priority 1: Reduce Subprocess Overhead (Issue #49)

Option A: In-Process Execution (Fastest)

Option B: Persistent Worker Pool

Option C: Batch Subprocess Execution

Priority 2: Optimize Coverage Collection (Issue #51)

Priority 3: Cache Test Collection

Target Performance

Artifacts

Conclusion