Fast-first mutation testing for pytest. Speed that makes mutation testing practical for everyday TDD.
Let the gremlins loose. See which ones survive.
- Speed-First Architecture - Mutation switching eliminates file I/O and module reloads. Run gremlins in seconds, not hours.
- Native pytest Integration - Zero configuration to start. Just add
--gremlinsto your pytest command. - Coverage-Guided Selection - Only runs tests that actually cover the mutated code. 10-100x fewer test executions in well-modularized codebases.
- Incremental Caching - Results cached by content hash. Unchanged code skips re-testing entirely.
- Parallel Execution - Distribute gremlins across CPU cores for linear speedup.
# Install
pip install pytest-gremlins
# Run mutation testing
pytest --gremlinsThat's it. pytest-gremlins will instrument your code, release the gremlins, and report which ones your tests zapped (good!) and which survived (test gaps!).
Code coverage lies. It tells you what code your tests execute, but not whether your tests would catch bugs.
Mutation testing answers a harder question: If I introduce a bug, will my tests fail?
| Tool | Limitation |
|---|---|
| mutmut | Single-threaded by default, no incremental analysis |
| Cosmic Ray | Complex setup; distributed mode requires Celery |
| MutPy | Unmaintained (last update 2019), Python 3.4-3.7 only |
| mutatest | Unmaintained (last update 2022) |
pytest-gremlins is fast because of how it works, not just parallelization:
- Mutation Switching - Instrument once, toggle mutations via environment variable
- Coverage Guidance - Only run tests that cover the mutated code
- Incremental Analysis - Skip unchanged code on repeat runs
- Parallel Execution - Safe parallelization with no shared state
Benchmarked against mutmut on a synthetic project:
| Mode | Time | vs mutmut | Speedup |
|---|---|---|---|
--gremlins (sequential) |
17.79s | 14.90s | 0.84x (see note) |
--gremlins --gremlin-parallel |
3.99s | 14.90s | 3.73x faster |
--gremlins --gremlin-parallel --gremlin-cache |
1.08s | 14.90s | 13.82x faster |
Key findings:
- Sequential mode is slower due to subprocess isolation overhead; detailed profiling shows 1.7x slower on small targets
- Parallel mode delivers 3.73x speedup over mutmut
- With caching, subsequent runs are 13.82x faster
- pytest-gremlins found 117 mutations vs mutmut's 86, with 98% kill rate vs 86%
================== pytest-gremlins mutation report ==================
Zapped: 142 gremlins (85%)
Survived: 18 gremlins (11%)
Timeout: 5 gremlins (3%)
Error: 2 gremlins (1%)
Top surviving gremlins:
src/auth.py:42 >= -> > (comparison)
src/utils.py:17 + -> - (arithmetic)
src/api.py:88 True -> False (boolean)
Run with --gremlin-report=html for detailed report.
=====================================================================
Timeout and Error categories are only shown when their count is greater than zero.
# With pip
pip install pytest-gremlins
# With uv
uv add pytest-gremlins
# With poetry
poetry add pytest-gremlinsRequires Python 3.11+
Zero configuration required for most projects. The plugin auto-discovers source paths from your
pyproject.toml setuptools config (e.g., [tool.setuptools.packages.find]). If auto-discovery
doesn't find your code, configure paths explicitly:
[tool.pytest-gremlins]
# Operators to use (default: all)
operators = ["comparison", "arithmetic", "boolean"]
# Paths to mutate (optional -- auto-discovered from setuptools metadata)
paths = ["src"]
# Patterns to exclude
exclude = ["**/migrations/*", "**/test_*"]
# Minimum mutation score to pass
min_score = 80We use Gremlins movie references as our domain language:
| Traditional Term | Gremlin Term | Meaning |
|---|---|---|
| Original code | Mogwai | Your clean, untouched source code |
| Start mutation testing | Feed after midnight | Begin the mutation process |
| Mutant | Gremlin | A mutation injected into your code |
| Kill mutant | Zap | Your test caught the mutation |
| Surviving mutant | Survivor | Mutation your tests missed |
Full documentation: pytest-gremlins.readthedocs.io
- pytest-test-categories - Enforce Google test size standards in Python
- dioxide - Rust-backed dependency injection for Python
Contributions welcome! See our Contributing Guide.
This project uses strict TDD discipline with BDD/Gherkin scenarios. All contributions must include tests written before implementation.
Note on code coverage: We target 69% coverage due to inherent limitations in measuring pytest plugins (import timing, subprocess execution). See CONTRIBUTING.md for details.
MIT License. See LICENSE.
See CHANGELOG.md for release history.