Skip to content

Latest commit

 

History

History
1862 lines (1378 loc) · 74 KB

File metadata and controls

1862 lines (1378 loc) · 74 KB

Tach Roadmap

Current Version: See CHANGELOG.md for the latest release and version history.

This document outlines the planned development trajectory for Tach. Items are aspirational and subject to change based on community feedback and technical discoveries.


Version Overview

gantt
    title Tach Development Phases
    dateFormat  YYYY-MM
    section Foundation
    0.1.x Core Infrastructure    :done, 2026-01, 2026-02
    section Compatibility
    0.2.x Plugin Ecosystem       :done, 2026-02, 2026-04
    0.3.x Database Integration   :2026-04, 2026-06
    section Fixtures
    0.4.x Hierarchical Zygotes   :done, 2026-06, 2026-08
    0.5.x Developer Experience   :2026-08, 2026-10
    section Performance
    0.6.x Configuration          :2026-10, 2026-12
    0.7.x Memory Snapshotting    :2027-01, 2027-03
    section Platform
    0.8.x CI/CD + Sub-Interpreters :2027-03, 2027-06
    0.9.x Stability              :active, 2027-06, 2027-08
    section Release
    0.10.x Beta 1                :2027-08, 2027-09
    0.11.x Beta 2 + RC           :2027-09, 2027-10
    1.0.0 Production             :milestone, 2027-10, 0d
Loading

Complete Development Flow

flowchart TB
    subgraph Phase1["Phase 1: Foundation ✅ COMPLETE"]
        direction TB
        P1_1["0.1.1 Docs & Polish"]
        P1_2["0.1.2 Test Compatibility"]
        P1_3["0.1.3 Error Handling"]
        P1_4["0.1.4 Dependency Updates"]
        P1_5["0.1.5 Tooling Research"]

        P1_1 --> P1_2
        P1_2 --> P1_3
        P1_3 --> P1_4
        P1_4 --> P1_5
    end

    subgraph Phase2["Phase 2: Plugin Compatibility ✅ COMPLETE"]
        direction TB
        P2_0["0.2.0 Hook Framework ✅"]
        P2_1["0.2.1 pytest-django ✅"]
        P2_2["0.2.2 pytest-asyncio ✅"]
        P2_3["0.2.3 pytest-mock/env/timeout + Django Markers ✅"]
        P2_4["0.2.4 Landlock V4-V6 ✅"]
        P2_5["0.2.5 Plugin Stabilization ✅"]

        P2_0 --> P2_1
        P2_0 --> P2_2
        P2_0 --> P2_3
        P2_1 --> P2_3
        P2_0 --> P2_4
        P2_1 --> P2_5
        P2_2 --> P2_5
        P2_3 --> P2_5
        P2_4 -.->|optional| P2_5
    end

    subgraph Phase3["Phase 3: Database Integration"]
        direction TB
        P3_0["0.3.0 Django DB<br>(Transaction Rollback)"]
        P3_1["0.3.1 SQLAlchemy<br>(Session Mgmt)"]
        P3_2["0.3.2 Connection Mgmt<br>(FD Teleportation)"]
        P3_3["0.3.3 Additional DBs<br>(Postgres/MySQL/SQLite)"]

        P3_0 --> P3_2
        P3_1 --> P3_2
        P3_2 --> P3_3
    end

    subgraph Phase4["Phase 4: Fixture Lifecycle"]
        direction TB
        P4_0["0.4.0 Session Fixtures<br>(Shared Memory Cache) ✅"]
        P4_1["0.4.1 Module Fixtures<br>(Boundary Detection) ✅"]
        P4_2["0.4.2 Class Fixtures ✅"]
        P4_3["0.4.3 Autouse Injection<br>(Auto-inject autouse=True)"]
        P4_4["0.4.4 Parametrized Fixtures<br>(Expand params at discovery)"]
        P4_5["0.4.5 Zygote Warmup<br>(Configurable pre-imports)"]
        P4_6["0.4.6 Zygote Pool<br>(Per-scope pools)"]

        P4_0 --> P4_1
        P4_1 --> P4_2
        P4_2 --> P4_5
        P4_3 --> P4_5
        P4_4 --> P4_5
        P4_5 --> P4_6
    end

    subgraph Phase5["Phase 5: Developer Experience"]
        direction TB
        P5_0["0.5.0 Enhanced Tracebacks ✅<br>(Colorization done)"]
        P5_1["0.5.1 Debug Mode"]
        P5_2["0.5.2 Interactive Debugging<br>(pdb/breakpoint)"]
        P5_3["0.5.3 Watch Mode Enhancements<br>(Targeted re-discovery)"]
        P5_4["0.5.4 Smart Watch Filtering<br>(.tachignore support)"]
        P5_5["0.5.5 Log Capture<br>(Structured parsing)"]
        P5_6["0.5.6 Coverage Optimization ✅<br>(PEP 669 done)"]

        P5_1 --> P5_2
        P5_3 --> P5_4
    end

    subgraph Phase6["Phase 6: Configuration"]
        direction TB
        P6_0["0.6.0 pyproject.toml Schema"]
        P6_1["0.6.1 ENV_DENYLIST<br>(Security filtering)"]
        P6_2["0.6.2 Toxicity Config<br>(Configurable blocklist)"]
        P6_3["0.6.3 Plugin Config<br>(Priority/disabled)"]
        P6_4["0.6.4 Scheduler Persistence<br>(Resume interrupted runs)"]
        P6_5["0.6.5 Config Profiles"]

        P6_0 --> P6_1
        P6_0 --> P6_2
        P6_0 --> P6_3
        P6_0 --> P6_4
        P6_4 --> P6_5
    end

    subgraph Phase7["Phase 7: Performance"]
        direction TB
        P7_0["0.7.0 Test History Store ✅<br>(SQLite duration cache)"]
        P7_1["0.7.1 Memory Optimization<br>(Snapshot Compression)"]
        P7_2["0.7.2 UFFD Write-Protect<br>(Dirty Page Tracking)"]
        P7_3["0.7.3 Vectorized Restore<br>(Batch UFFDIO_COPY)"]
        P7_4["0.7.4 TLS Calibration ✅<br>(Sentinel scan done)"]
        P7_5["0.7.5 Adaptive Scheduling ✅<br>(Duration Prediction)"]
        P7_6["0.7.6 Lazy Loading<br>(On-demand Import)"]
        P7_7["0.7.7 Advanced Snapshots<br>(Kernel LKM Research)"]
        P7_8["0.7.8 UFFD_EVENT_FORK<br>(Fork Tracking)"]
        P7_9["0.7.9 UFFD_EVENT_REMAP<br>(mremap Tracking)"]

        P7_0 --> P7_5
        P7_1 --> P7_2
        P7_2 --> P7_3
        P7_3 --> P7_7
        P7_1 --> P7_8
        P7_8 --> P7_9
    end

    subgraph Phase8["Phase 8: Platform Integration"]
        direction TB
        P8_0["0.8.0 GitHub Actions<br>(Annotations/Summary)"]
        P8_1["0.8.1 JUnit XML ✅<br>(Already implemented)"]
        P8_2["0.8.2 Other CI Platforms<br>(TeamCity/Azure DevOps)"]
        P8_3["0.8.3 Coverage Formats<br>(Cobertura/HTML)"]
        P8_4["0.8.4 Sub-Interp Architecture<br>(Design: Zygote hybrid)"]
        P8_5["0.8.5 Sub-Interpreters<br>(PEP 684 Experimental)"]
        P8_6["0.8.6 Sub-Interp State Reset<br>(Module re-init)"]

        P8_0 --> P8_2
        P8_2 --> P8_3
        P8_4 --> P8_5
        P8_5 --> P8_6
    end

    subgraph Phase9["Phase 9: Stability"]
        direction TB
        P9_0["0.9.0 Crash Recovery ✅<br>(SIGCHLD detection)"]
        P9_1["0.9.1 Signal Routing<br>(Debug mode handling)"]
        P9_2["0.9.2 CleanupGuard<br>(Mutex poison immunity)"]
        P9_3["0.9.3 UFFD FD Limits<br>(Per-worker tracking)"]
        P9_4["0.9.4 Snapshot Memory<br>(Golden page budget)"]
        P9_5["0.9.5 OverlayFS Cleanup<br>(Upperdir pruning)"]
        P9_6["0.9.6 Seccomp Limits<br>(BPF instruction count)"]
        P9_7["0.9.7 Protocol Versioning<br>(Upgrade path)"]
        P9_8["0.9.8 Stress Testing<br>(10k+ Tests)"]

        P9_0 --> P9_1
        P9_1 --> P9_2
        P9_2 --> P9_3
        P9_3 --> P9_4
        P9_4 --> P9_5
        P9_5 --> P9_6
        P9_6 --> P9_7
        P9_7 --> P9_8
    end

    subgraph Phase10["Phase 10: Release 🔵 MILESTONE"]
        direction TB
        P10_0["0.10.0 Beta 1<br>(Feature Freeze)"]
        P10_1["0.10.1 Beta 1 Fixes"]
        P10_2["0.11.0 Beta 2"]
        P10_3["0.11.1 RC1"]
        P10_4["0.11.2 RC2"]
        P10_5["1.0.0 Production<br>(API Stability)"]

        P10_0 --> P10_1
        P10_1 --> P10_2
        P10_2 --> P10_3
        P10_3 --> P10_4
        P10_4 --> P10_5
    end

    subgraph Future["Future (Post-1.0)"]
        %% Details in "Future Phases (Post-1.0)" table below
        direction LR
        F0["1.1.x Maintenance"]
        F1["1.2.x Features"]
        F2["0.12.x Remote Execution"]
        F3["0.13.x Test Sharding ✅<br>(Shipped in 0.9.0)"]
        F4["0.14.x Visual Testing"]
        F5["0.15.x AI-Powered"]
        F6["0.16.x Mutation Testing"]
        F7["0.17.x Property-Based"]
        F8["0.18.x Contract Testing"]
        F9["0.19.x Benchmarking"]
        F10["0.20.x Observability"]
    end

    %% Phase Dependencies (Phase 1 enables parallel work)
    Phase1 --> Phase2
    Phase1 --> Phase5
    Phase1 --> Phase6
    Phase1 --> Phase7
    Phase1 --> Phase8
    Phase1 --> Phase9
    Phase2 --> Phase3
    Phase3 --> Phase4
    Phase4 --> Phase10
    Phase9 --> Phase10
    Phase10 --> Future

    %% Cross-phase dependencies
    P2_1 -.->|"Django fixtures"| P3_0
    P3_0 -.->|"DB transactions"| P4_0
    P5_6 -.->|"PEP 669 coverage"| P8_3
    P7_8 -.->|"fork tracking"| P9_0

    %% Note: Some nodes (P5_0, P5_5, P5_6, P7_4, P7_6, P8_1) are intentionally disconnected
    %% They represent completed standalone work or items that can start independently

    %% Styling
    classDef done fill:#22c55e,stroke:#16a34a,color:#fff
    classDef inProgress fill:#f59e0b,stroke:#d97706,color:#fff
    classDef canStart fill:#3b82f6,stroke:#1d4ed8,color:#fff
    classDef pending fill:#94a3b8,stroke:#64748b,color:#fff
    classDef milestone fill:#8b5cf6,stroke:#7c3aed,color:#fff,stroke-width:2px

    class P1_1,P1_2,P1_3,P1_4,P1_5 done
    class P2_0,P2_1,P2_2,P2_3,P5_0,P5_6,P7_4,P8_1 done
    class P6_0,P6_1,P6_2,P6_4,P8_0,P9_2,P9_5,P9_7 done
    class P5_1 done
    class P5_3,P5_5,P6_3,P6_5,P7_1,P7_6,P9_6 canStart
    class P2_4,P2_5 done
    class P3_0,P3_1 done
    class P3_2,P3_3 pending
    class P4_0,P4_1,P4_2 done
    class P4_3,P4_4,P4_5,P4_6 pending
    class P5_2,P5_4 pending
    class P6_3,P6_4,P6_5 pending
    class P7_0,P7_5 done
    class P7_2,P7_3,P7_7,P7_8,P7_9 pending
    class P8_2,P8_3,P8_4,P8_5,P8_6 pending
    class P9_0 done
    class P9_1,P9_3,P9_4,P9_8 pending
    class P10_0,P10_1,P10_2,P10_3,P10_4 pending
    class P10_5 milestone
    class F0,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10 pending
Loading

Legend: 🟢 Done | 🟠 In Progress | 🔵 Can Start Now | ⚪ Pending | 🟣 Milestone

Current Status (v0.9.0):

  • Phase 1 (0.1.x): Complete
  • Phase 2 (0.2.x): Complete
  • Phase 3 (0.3.x): 0.3.0 + 0.3.1 done
  • Phase 4 (0.4.x): 0.4.0-0.4.2 done (scope-aware fixture lifecycle for session/module/class)
  • Phase 5 (0.5.x): 0.5.0-0.5.1 + 0.5.6 done
  • Phase 6 (0.6.x): 0.6.0-0.6.4 done
  • Phase 7 (0.7.x): 0.7.0 (history store) + 0.7.4 (TLS calibration) + 0.7.5 (adaptive scheduling) done
  • Phase 8 (0.8.x): 0.8.0-0.8.6 done (GitHub Actions, JUnit XML, bench, ~68 CLI flags)
  • Phase 9 (0.9.x): 0.9.0 (SIGCHLD crash detection) + 0.9.2 (CleanupGuard) + 0.9.5 (stale cleanup) + 0.9.7 (protocol versioning) done
  • Test sharding (--shard) shipped early in 0.9.0 (originally planned post-1.0)
  • 1083 tests, scope-aware fixtures, SIGCHLD crash detection, adaptive scheduling
  • Phase 10: Not started

Strategic Context

Research Foundation: This roadmap is informed by 12 research papers and competitive analysis of 10+ Rust-Python test tools. See research/README.md for paper analysis and implementation mapping, and external-research.md for competitive landscape.

Competitive Landscape Summary

Tool Approach Startup Tach Advantage
pytest-xdist execnet workers ~50-100ms 1000x faster isolation
pytest-forked fork() per test ~500-1000μs 10x faster reset
Maelstrom Container per test 50-100ms 1000x faster startup
rtest/karva No isolation N/A Full isolation + fixtures
snob Test selection only N/A Full execution engine

Key Insight: No existing tool combines Tach's speed (<50μs reset), isolation (userfaultfd), and compatibility (full pytest fixtures). See external-research.md §2.3 for detailed analysis.

Competitive Feature Matrix

Feature Tach pytest-xdist Maelstrom snob rtest
Per-test isolation userfaultfd None Containers None None
Reset time <50μs N/A 50-100ms N/A N/A
Full fixtures Yes Yes Limited N/A No
Test selection Planned No No Yes No
Distributed Planned Yes Yes No No
Mutation testing Planned No No No No
Static discovery Yes No Yes Yes Yes

Key Differentiator: Tach is the only tool combining sub-millisecond isolation with full pytest fixture support. Competitors sacrifice either speed (Maelstrom) or compatibility (rtest, karva).

Container Compatibility

Full Matrix: See container-compatibility.md for Docker, Podman, and Kubernetes configurations with capability requirements.

Python Version Compatibility

Full Matrix: See ../python-compatibility.md for Python 3.10-3.14 support, PyPy status, and free-threading implications.

Kernel Version Requirements

Full Matrix: See isolation-landlock.md for Landlock ABI V1-V6 requirements and isolation-userfaultfd.md for userfaultfd kernel requirements.

What Tach Must Implement for pytest Parity

Critical (Blocking Adoption):

  • Plugin Shim (0.2.x): pytest-django, pytest-asyncio, pytest-mock support

    Complete. Session effects, hook interception, marker extraction all working.

  • Database Rollback (0.3.x): Transaction savepoint/rollback for Django ORM, SQLAlchemy

    Database tests are ~40% of enterprise test suites. Memory snapshots don't restore DB state.

  • Session/Module Fixtures (0.4.x): Fixtures persisting across tests

    Complete in 0.9.0. Scope-aware scheduling with skip_reset preserves fixture state across module/class boundaries.

Important (Adoption Friction):

  • pytest.raises/warns: Exception and warning assertion helpers
  • Parametrized Fixtures: @pytest.fixture(params=[...])
  • Marker Expressions: Full -m expression support (-m "slow and not db")
  • conftest.py Hooks: pytest_configure, pytest_collection_modifyitems

Nice-to-Have (Competitive Edge):

  • Test Impact Analysis: snob-style "only run affected tests" mode

    Ref: alexpasmantier/snob - dependency graph analysis Implementation approach:

    1. Build code-to-test dependency graph during discovery
    2. Track which source files affect which tests via import analysis
    3. Integrate with git diff for "affected tests only" mode
    4. Cache dependency graph with file hash invalidation
    5. Provide --affected CLI flag for CI integration
  • Flaky Test Detection: nextest-style retry and flakiness tracking
  • Distributed Execution: Maelstrom-style cluster mode for CI farms

What Tach Must Learn From Competitors

Based on external-research.md §24:

From snob (Test Impact Analysis):

  • Dependency graph analysis for test selection
  • Git commit range integration (--affected --commit-range HEAD~5..HEAD)
  • Cache dependency graph with file hash invalidation

From nextest (CI Integration):

  • Test partitioning for parallel CI jobs
  • Flaky test detection with automatic retry
  • Progress reporting UX patterns

From Maelstrom (Distribution):

  • Broker/worker architecture for cluster mode
  • OCI-like container images for reproducibility
  • Cross-node result aggregation

From pymute (Quality):

  • Mutation testing integration
  • Parallel mutant execution patterns
  • Quality score reporting

Research-to-Implementation Mapping

Version Research Phase Primary Paper Key Deliverable
0.1.x Static Discovery Python Testing Engine Rust Breakthroughs AST-based test discovery eliminating "Import Tax"
0.2.x Plugin Isolation Project Tach Compatibility Layer Blueprint Shadow plugin shim with syscall interception
0.3.x Database Safety Fork Safety of Python C-Extensions Transactional rollback, connection dispose pattern
0.4.x Zygote Hierarchy Forklift, Python Monorepo Zygote Tree Design DAAC clustering for hierarchical pre-initialization
0.5.x Observability Rust-CPython Execution Blueprint Research PEP 669 low-impact monitoring integration
0.6.x Zero-Copy Loading Zero-Copy Python Module Loading mmap-based bytecode loading bypassing importlib
0.7.x Memory Snapshots Python Memory Snapshotting with Userfaultfd userfaultfd + MADV_DONTNEED microsecond reset
0.8.x+ Cross-Platform Cross-Platform Process Cloning Research mach_vm_remap (macOS), NT Section Objects (Windows)

Research Verification Checklist

Before 1.0.0, verify all critical research requirements are met.

Tooling and Container Compatibility (Q1 2026):

Requirement Status Documentation
.ignore File Interactions Done tooling-conflicts.md
Container Sandbox Behavior Done container-compatibility.md
Ignored Test Categories (24 total) Done test-discovery-analysis.md

Original Research Requirements:

Requirement Research Source External Ref Status
Allocator Quiesce (thread.tcache.flush) Memory Snapshotting with Userfaultfd jemalloc mallctl Pending
Toxicity Detection (fork-unsafe patterns) Static Analysis for Toxic Python Modules POSIX fork() Pending
Namespace Isolation (CLONE_NEWNS/NET) Compatibility Layer Blueprint Landlock docs Pending
Database Dispose (connection pools) Fork Safety of Python C-Extensions Pending
TLS Restoration (mimalloc, Python 3.13+) Userfaultfd and CPython Allocator mimalloc Done
TLS Calibration (sentinel scan) Userfaultfd and CPython Allocator See src/isolation/calibration.rs Done
Landlock Path Canonicalization Compatibility Layer Blueprint PathFd TOCTOU safety Done
Seccomp Blacklist (22 syscalls) Compatibility Layer Blueprint See src/isolation/sandbox.rs Done
Iron Dome Integration Compatibility Layer Blueprint apply_iron_dome() in sandbox.rs Done
Graceful Degradation (kernel < 5.13) Compatibility Layer Blueprint SandboxStatus::NotEnforced handling Done
GIL Management (py.allow_threads()) PyO3 Parallelism Pending
PyO3 0.26+ API Migration PyO3 Migration Pending
TLS Segment Registration (fs_base) Userfaultfd and CPython Allocator arch_prctl(2) Pending
Free-Threaded Python (3.13t/3.14t) py-free-threading Pending

Documentation Index

Complete Index: See README.md for the full documentation map.

Category Count Key Documents
Deep Dives 7 isolation-deep-dive.md, discovery-deep-dive.md, execution-deep-dive.md
Isolation Modules 4 isolation-landlock.md, isolation-seccomp.md, isolation-userfaultfd.md
Research & Analysis 6 external-research.md, topic-archive.md, container-compatibility.md
User Documentation 7 ../quickstart.md, ../configuration.md, ../troubleshooting.md

Future Phases (Post-1.0)

Detailed specs in external-research.md and topic-archive.md

Version Feature Learn From
0.12.x Remote Execution Maelstrom broker/worker
0.13.x Test Sharding nextest --shard N/M
0.14.x Visual Testing Playwright snapshots
0.15.x AI-Powered Flaky detection, test gen
0.16.x Mutation Testing pymute patterns
0.17.x Property-Based hypothesis integration
0.18.x Contract Testing OpenAPI validation
0.19.x Benchmarking @benchmark marker
0.20.x Observability OpenTelemetry, Prometheus

0.1.x - Foundation (Complete)

Status: All 5 milestones delivered. See CHANGELOG.md for release details.

Research Foundation: Implements the "Kineton" engine from Python Testing Engine Rust Breakthroughs.

Delivered Features

Version Focus Key Deliverables
0.1.1 Docs & Polish Examples directory, quickstart guide, shell completions, --dry-run
0.1.2 Test Compatibility pytest.raises/warns/approx, traceback formatting, timeout handling
0.1.3 Error Handling Error categorization (E001-E020), --diagnose flag, remediation suggestions
0.1.4 Dependencies PyO3 0.27.2, Rust 2024 Edition, Python 3.14 support
0.1.5 Tooling Research .ignore conflicts, container compatibility, test discovery analysis

Implementation Details: For complete task breakdown and research references, see git history for v0.1.1-v0.1.5 tags.


0.2.x - Plugin Compatibility

Focus: Shadow plugin shim for pytest ecosystem integration without full pluggy support.

Research Foundation: Implements the "Matrix Layer" from Project Tach Compatibility Layer Blueprint for syscall isolation.

  • "Isolation without overhead requires moving from userspace interception to kernel-level integration—combined with a pragmatic plugin shim that records and replays pytest internals" — Project Tach Compatibility Layer Blueprint
  • "Every syscall that modifies global state is transparently isolated per-worker with <5% overhead" — Project Tach Compatibility Layer Blueprint

The 0.2.x series introduces a plugin compatibility layer that intercepts common pytest plugin hooks. This is NOT full pluggy support - instead, we implement targeted shims for the most popular plugins.

Development Flow: See the main flowchart at the top of this document for task dependencies. Items 0.2.1-0.2.4 can be developed in parallel after 0.2.0 is complete.

0.2.0 - Hook Interception Framework

Target: Core infrastructure for intercepting pytest hooks.

Status: Complete

Completed: Hook registry types with Serde, 10 builtin hook specs, hook detection in conftest.py, marker extraction from decorators (with JSON output), autouse fixture detection, path canonicalization for hook matching, SysPathAction enum (type-safe), session effects IPC bridge (Zygote → Supervisor → Workers), debug logging for effect application, pytest_sessionstart in SESSION_HOOKS, HookEffect enum with all variants, toxicity integration for global-state-modifying hooks, conftest inheritance resolution, effect recording for pytest_configure/sessionstart, effect replay in workers, IPC protocol extension, plugin detection and warning system, HookResult type and aggregation strategies, HookCaller with PyO3 bridge, hook dependency graph, plugin shim registry.

Hook System Architecture

  • Design hook interception architecture

    Ref: "Most pytest plugins perform one of three actions: Metadata modification, Fixture setup, or Reporting. Only (1) and (2) must be captured" — Project Tach Compatibility Layer Blueprint

    • Hook registry for tracking available hooks
    • Hook types with Serde derives for IPC serialization
    • 10 builtin hook specs (pytestconfigure, pytest_runtest*, etc.)
    • Hook caller that invokes registered handlers
    • Hook result aggregation (first-result, all-results)
    • Hook wrapper specifications
  • Implement conftest.py discovery and loading
    • Scan for conftest.py in test directories (existing)
    • Parse hook function definitions
    • Extract pytest markers from @pytest.mark.* decorators
    • Detect autouse fixtures
    • Build hook dependency graph
    • Handle conftest inheritance

Core Hook Support

  • pytest_configure(config) - Plugin configuration
  • pytest_collection_modifyitems(items) - Test collection modification

    Ref: "By recording effects in the parent and replaying them in the child, Tach avoids the need to re-run complex plugin logic in every worker" — Project Tach Compatibility Layer Blueprint

  • pytest_runtest_setup(item) - Pre-test setup
  • pytest_runtest_teardown(item) - Post-test teardown
  • pytest_runtest_makereport(item, call) - Result reporting
  • pytest_sessionstart(session) - Session initialization
  • pytest_sessionfinish(session) - Session cleanup

Plugin Registration

  • Detect installed pytest plugins via pkg_resources
  • Create plugin shim registry

    Ref: "The Tach supervisor creates a per-worker isolated namespace at clone time" — Project Tach Compatibility Layer Blueprint

  • Log warnings for unsupported plugins
  • Allow disabling specific plugins via config
  • Support plugin ordering/priority

0.2.1 - pytest-django Support

Status: ✅ COMPLETE (Core Infrastructure)

Target: First-class Django test support.

Parallelization: Can be developed in parallel with 0.2.2, 0.2.3, and 0.2.3.1. Only requires 0.2.0 (hook framework) to be complete. No dependencies on other 0.2.x versions.

Note: Marker detection (django_db, urls, etc.) is already implemented in core discovery. Tests marked with @pytest.mark.django_db are detected and the marker name is available in TestCase.markers. The items below are about executing the marker behavior.

Implemented (v0.2.1)

  • @pytest.mark.django_db - Basic marker detection and savepoint isolation
  • DjangoDbSetup HookEffect for database configuration
  • SAVEPOINT-based transaction rollback in harness
  • pytest-django registered as "Supported" plugin
  • Integration tests in tests/gauntlet_django/

Deferred to 0.3.x (Database Integration)

See GitHub issues for tracking:

  • transaction=True - Use real transactions (#40)
  • reset_sequences=True - Reset auto-increment (#36)
  • databases=['default', 'secondary'] - Multi-db (#38)
  • @pytest.mark.urls('myapp.test_urls') - URL override (#35)
  • @pytest.mark.ignore_template_errors - Template error handling (#35)

Django Fixtures (Deferred to 0.3.x/0.4.x)

See #39 for tracking:

  • client - Django test client
  • rf - Request factory
  • admin_client - Logged-in admin client
  • admin_user - Admin user instance
  • django_user_model - User model class
  • django_username_field - Username field name
  • settings - Settings override context manager
  • live_server - Live server URL
  • db - Database access fixture
  • transactional_db - Transactional database

Database Handling (Deferred to 0.3.x)

  • Hook into Django's transaction management (savepoint-based)
  • Preserve database connections across test resets
  • Handle database migrations in test database
  • Support --reuse-db flag for faster test runs (#37)
  • Support --create-db flag for fresh database (#37)
  • Handle multi-database configurations (#38)
  • Support database aliases (#38)

0.2.2 - pytest-asyncio Support

Target: Native async/await test support.

Parallelization: Can be developed in parallel with 0.2.1, 0.2.3, and 0.2.3.1. Only requires 0.2.0 (hook framework) to be complete. No dependencies on other 0.2.x versions.

Async Detection

  • Detect async test functions (async def test_...)

    Already implemented in core discovery - TestCase.is_async field

  • Detect async fixtures (@pytest.fixture on async functions)
  • Handle sync tests that use async fixtures
  • Support async context managers
  • Handle async generators

Event Loop Management

  • Create event loop per test (default)

    Ref: "To solve this, we employ tokio::task::LocalSet to pin interpreter-specific tasks to their originating thread" — Rust-CPython Execution Blueprint Research

  • Support session-scoped event loop via marker
  • Properly cleanup event loop after test
  • Handle asyncio.run() calls within tests
  • Support custom event loop policies
  • Handle uvloop integration

Marker Support

  • @pytest.mark.asyncio - Mark async tests
  • @pytest.mark.asyncio(loop_scope="session") - Shared loop
  • @pytest.mark.asyncio(loop_scope="module") - Module loop
  • Automatic async test detection mode

Coroutine Execution

  • Run async tests with proper timeout handling
  • Support await in async fixtures
  • Handle async context managers in fixtures
  • Proper cancellation on test timeout
  • Support gather/wait patterns
  • Handle TaskGroup cleanup

0.2.3 - Additional Plugin Support + Django Markers

Target: Support for commonly used pytest plugins and additional Django markers.

Status: ✅ COMPLETE

Parallelization: Can be developed in parallel with 0.2.1 and 0.2.2. Only requires 0.2.0 (hook framework) to be complete.

pytest-mock

  • mocker fixture providing unittest.mock wrappers

    Works natively via pytest's fixture resolution. Tach does not intercept.

  • mocker.patch() context manager
  • mocker.patch.object() method
  • mocker.patch.dict() dictionary patching
  • mocker.spy() for call tracking
  • mocker.stub() for stub creation
  • Automatic mock cleanup after each test

    Handled by pytest-mock's built-in teardown

  • Support mocker.stopall()

pytest-env

  • Read [pytest_env] from pyproject.toml
  • Set environment variables before test collection
  • Support variable expansion ({VAR})

    Note: Uses {VAR} format per pytest-env convention, not ${VAR}

  • Preserve original values for restoration

    Note: pytest-env does NOT restore values by design. This requirement was incorrect.

  • Support conditional env vars

    Basic support via expansion. Full conditional syntax deferred.

pytest-timeout

  • @pytest.mark.timeout(30) marker support
  • Global timeout via config
  • Timeout methods: signal, thread

    Note: Tach uses supervisor-level process termination (SIGTERM/SIGKILL)

  • Timeout callback for custom handling

    Implemented via timeout_hook in config

  • Per-phase timeouts (setup, call, teardown)

    Note: Current implementation is aggregate timeout. Per-phase is future enhancement.

Django URL and Template Markers (Issue #35)

  • @pytest.mark.urls('myapp.test_urls') - Override ROOT_URLCONF per test
  • @pytest.mark.ignore_template_errors - Suppress template errors
  • Positional argument extraction in Rust scanner
  • URL cache clearing on override/restore
  • Template debug mode toggle

Deferred to 0.3.x (Database Integration)

The following django_db marker options require deeper database transaction support:

  • transaction=True - Use real transactions (not savepoints)
  • reset_sequences=True - Reset auto-increment sequences
  • databases=['default', 'secondary'] - Multi-database support

pytest-cov (Deferred)

  • Detect pytest-cov and warn about Tach's native coverage

    Ref: "employs PEP 669 (Low-Impact Monitoring) to achieve observability with negligible overhead" — Rust-CPython Execution Blueprint Research

  • Suggest using --coverage flag instead
  • Disable pytest-cov when Tach coverage is active
  • Support coverage configuration options

pytest-xdist (Compatibility)

  • Detect pytest-xdist and warn about Tach's native parallelism

    Ref: "Objects passed between orchestrator and worker processes must be serialized, a CPU-intensive operation that often negates the benefits of parallelism for short-running tests" — Python Testing Engine Rust Breakthroughs

  • Support -n flag as alias for --workers
  • Ignore xdist-specific markers gracefully

0.2.4 - Landlock V4-V6 Network Isolation (Kernel 6.7+)

Target: Use Landlock for network isolation when available, reducing reliance on CLONE_NEWNET.

Status: ✅ COMPLETE

Parallelization: Fully independent. Can be developed at any time after 0.2.0. This is a kernel feature enhancement with no dependencies on plugin shims (0.2.1-0.2.4).

Network Restriction Rules

  • Detect Landlock ABI V4+ at runtime
  • Implement TCP bind restrictions per worker

    Workers should only bind to assigned port ranges

  • Implement TCP connect restrictions

    Block outbound connections except to localhost and configured hosts

  • Graceful fallback to CLONE_NEWNET on older kernels

Configuration

[tool.tach.network]
allow_localhost = true
allow_connect = ["api.example.com:443"]
allow_bind_ports = [8000, 8080]  # Empty = no binding allowed

External Ref: Landlock Kernel Docs - Network

0.2.5 - Plugin Testing and Stabilization

Target: Ensure plugin shims work correctly with real-world projects.

Status: ✅ COMPLETE

Parallelization: SEQUENTIAL - Must wait for 0.2.1, 0.2.2, and 0.2.3 to complete. This version tests and stabilizes all plugin shims, so the plugins must exist first.

Testing

  • Create plugin compatibility test suite
  • Test against popular open-source Django projects
  • Test against popular async projects (FastAPI, aiohttp)
  • Document plugin compatibility matrix
  • Create plugin integration tests

Performance

  • Benchmark plugin overhead
  • Optimize hook dispatch path
  • Cache conftest.py parsing results
  • Lazy-load plugin shims

0.3.x - Database Integration

Focus: Transaction rollback and connection handling for database-heavy test suites.

Research Foundation: Addresses the "Fork-Safety Paradox" from Fork Safety of Python C-Extensions and database isolation from Rust-Python Test Isolation Blueprint.

  • "The fundamental assumptions of fork()—specifically regarding memory isolation and state duplication—are incompatible with the complex internal threading pools, global state mutexes, and hardware contexts managed by modern C libraries" — Fork Safety of Python C-Extensions
  • "Ensure that any connection pool created in the parent is explicitly discarded in the child process immediately after startup" — Fork Safety of Python C-Extensions
  • "Injecting SAVEPOINT and ROLLBACK TO SAVEPOINT to make DB tests I/O-free" — Rust-Python Test Isolation Blueprint

The 0.3.x series focuses on database test isolation. The key insight is that database state cannot be restored via memory snapshots - we need to hook into the database driver level to rollback transactions.

0.3.0 - Django Database Support

Target: Django ORM transaction rollback.

Transaction Management

  • Hook into django.db.transaction.atomic()

    Ref: "Regardless of success or failure, Tach injects ROLLBACK TO SAVEPOINT tachtest_start. This instantly reverts the database state" — _Rust-Python Test Isolation Blueprint

  • Wrap each test in a savepoint
  • Rollback savepoint after test completion
  • Handle nested transactions correctly
  • Support transaction.on_commit() hooks
  • Handle transaction.non_atomic_requests

Multi-Database Support

  • Track all database aliases in use
  • Apply transaction wrapping to all databases
  • Handle cross-database queries
  • Support database routers
  • Handle read replicas

Connection Preservation

  • Keep database connections alive across tests

    Ref: "Ensure that any connection pool created in the parent is explicitly discarded in the child process immediately after startup" — Fork Safety of Python C-Extensions

  • Reset connection state without closing
  • Handle connection pool exhaustion
  • Reconnect on connection drop
  • Monitor connection health

Migration Handling

  • Detect migration state at startup
  • Skip migration if test database exists and is current
  • Support --create-db flag to force recreation
  • Handle migration conflicts gracefully
  • Support migration squashing

0.3.1 - SQLAlchemy Support

Target: SQLAlchemy session management.

Status: ✅ COMPLETE

Session Management

  • Hook into Session.commit() to prevent actual commits
  • Wrap sessions in nested transactions (savepoints)
  • Handle Session.rollback() within tests
  • Support scoped session patterns
  • Handle session-per-request patterns

Engine Configuration

  • Detect SQLAlchemy engine configuration
  • Apply connection pooling optimizations
  • Handle multiple engines (read replicas, etc.)
  • Support async SQLAlchemy (asyncpg, aiosqlite)
  • Handle engine disposal

Alembic Integration

  • Detect Alembic migration configuration
  • Verify migration state matches expected
  • Support running migrations before tests
  • Handle migration downgrade on test database
  • Support migration branching

0.3.2 - Connection Management

Target: Advanced connection pool handling.

Connection Pool Preservation

  • Keep connection pools alive across worker restarts
  • Implement FD handover via SCM_RIGHTS

    Ref: "Pass FDs to worker processes via Unix sockets. Reconstruct connection objects from FDs" — Project Tach Compatibility Layer Blueprint

  • Handle pool size limits correctly
  • Monitor connection health
  • Support connection aging

Database FD Handover

  • Capture database connection file descriptors
  • Pass FDs to worker processes via Unix sockets
  • Reconstruct connection objects from FDs
  • Handle SSL connections specially

    Ref: "SSL error: decryption failed or bad record mac" — Fork Safety of Python C-Extensions

  • Support connection metadata transfer

Health Checks

  • Verify connection validity before test
  • Detect stale connections
  • Reconnect automatically on failure
  • Log connection pool statistics
  • Emit metrics for monitoring

0.3.3 - Additional Database Support

Target: Support for other database systems.

PostgreSQL Specific

  • Support PostgreSQL savepoints natively
  • Handle advisory locks
  • Support LISTEN/NOTIFY cleanup
  • Handle temp tables correctly
  • Support PostgreSQL-specific types
  • Handle pg_dump/pg_restore for fixtures

MySQL/MariaDB Specific

  • Support MySQL savepoints
  • Handle MySQL-specific locking
  • Support MySQL 8.0+ features
  • Handle character set issues
  • Support MariaDB extensions

SQLite Specific

  • In-memory database optimization
  • File-based database snapshotting
  • Handle WAL mode correctly
  • Support shared cache mode
  • Handle SQLite concurrent access

MongoDB (Experimental)

  • Hook into PyMongo sessions
  • Transaction support (requires replica set)
  • Collection cleanup approach for non-transactional
  • Document limitations
  • Support Motor (async MongoDB)

Redis (Experimental)

  • Support Redis transactions
  • Handle Redis pub/sub cleanup
  • Support Redis Cluster
  • Handle connection pooling

gRPC Fork Safety

  • Auto-detect gRPC usage in test dependencies
  • Set GRPC_ENABLE_FORK_SUPPORT=1 environment variable

    Ref: "gRPC fork safety requires GRPC_ENABLE_FORK_SUPPORT=1 and epoll1 polling" — Fork Safety of Python C-Extensions

  • Verify epoll1 polling engine compatibility
  • Warn if active RPCs detected before fork

    gRPC fork support only works with no active RPCs

  • Document gRPC-specific test patterns

External Ref: gRPC Fork Support


0.4.x - Fixture Lifecycle

Focus: Proper handling of session-scoped and module-scoped fixtures.

Research Foundation: Implements "Hierarchical Zygote Trees" from Forklift and Python Monorepo Zygote Tree Design using DAAC clustering.

  • "By moving beyond the traditional single-zygote model to a tiered, hierarchical structure, the proposed system maximizes memory sharing via Copy-on-Write (CoW) mechanisms" — Python Monorepo Zygote Tree Design
  • "The root node contains universally shared modules (e.g., os, sys). Child nodes branch off to specialize (e.g., a 'Data Science Zygote' adds numpy)" — Python Monorepo Zygote Tree Design
  • "A novel 'Dependency-Aware Agglomerative Clustering' (DAAC) algorithm that synthesizes the dependency graph into an optimal initialization tree" — Python Monorepo Zygote Tree Design

The 0.4.x series addresses one of the biggest gaps in the current implementation: fixtures that should persist across multiple tests. Session-scoped fixtures in particular are tricky because they must survive worker restarts.

0.4.0 - Session-Scoped Fixtures

Target: Fixtures that persist for the entire test session.

Status: Complete (v0.9.0) - Session-scoped autouse fixtures execute in zygote before fork.

Session Fixture Caching

  • Identify session-scoped fixtures at discovery time

    Ref: "The forked process receives the list of modules to add via a pipe. It imports them. This process becomes the 'DataScience Zygote'" — Python Monorepo Zygote Tree Design

  • Execute session fixtures before any tests run
  • Store fixture values in shared memory

    Ref: "This 'Zero-Copy' approach reduces the overhead of data transfer from O(N) (serialization) to O(1) (pointer passing)" — Rust-Python Test Isolation Blueprint

  • Make values available to all workers
  • Handle fixture dependencies

Serialization Strategy

  • Define serialization protocol for fixture values
  • Handle pickle-able objects directly

    Ref: "Objects passed between orchestrator and worker processes must be serialized (pickled) and deserialized, a CPU-intensive operation" — Python Testing Engine Rust Breakthroughs

  • Support custom serializers for complex objects
  • Handle non-serializable fixtures (connections, etc.)
  • Support cloudpickle for lambda functions

Finalization

  • Track session fixture finalizers
  • Run finalizers after all tests complete
  • Handle finalizer errors gracefully
  • Support async finalizers
  • Ensure finalizer ordering

0.4.1 - Module-Scoped Fixtures

Target: Fixtures that persist for a single module.

Status: Complete (v0.9.0) - Scheduler groups tests by module and dispatches sequentially with skip_reset.

Module Boundary Detection

  • Group tests by module at scheduling time

    Ref: "In this model, zygotes are specialized at different levels of a dependency tree. A root zygote might hold the OS-level dependencies; a second-level zygote might import pandas and numpy" — Rust Static Analysis for Toxic Python Modules

  • Track module transitions during execution
  • Trigger fixture finalization on module change
  • Handle module re-entry

Fixture Lifecycle

  • Setup module fixtures before first test in module
  • Cache fixture values during module execution
  • Teardown fixtures when leaving module
  • Handle module import errors gracefully
  • Support fixture reuse within module

Optimization

  • Batch tests from same module to same worker

    Ref: "We define a Weight Vector W where W[j] corresponds to the estimated cost of module mj. These weights are derived from heuristics or optional historical profiling data" — _Python Monorepo Zygote Tree Design

  • Minimize fixture setup/teardown overhead
  • Share module fixtures between workers when safe
  • Prefetch module fixtures

0.4.2 - Class-Scoped Fixtures

Target: Fixtures that persist for a test class.

Status: Complete (v0.9.0) - Scheduler groups tests by class and dispatches sequentially with skip_reset.

Class Boundary Detection

  • Group tests by class at scheduling time
  • Track class transitions during execution
  • Handle class inheritance correctly
  • Support nested test classes

Fixture Lifecycle

  • Setup class fixtures before first test in class
  • Cache fixture values during class execution
  • Teardown fixtures when leaving class
  • Handle setup_class/teardown_class methods

0.4.3 - Advanced Fixture Features

Target: Complete fixture compatibility with pytest.

Autouse Fixtures

  • Detect @pytest.fixture(autouse=True)
  • Automatically apply to matching tests
  • Respect fixture scope for autouse
  • Handle autouse in conftest.py
  • Support conditional autouse

Fixture Finalization Order

  • Build fixture dependency graph

    Ref: "A novel 'Dependency-Aware Agglomerative Clustering' (DAAC) algorithm that synthesizes the dependency graph into an optimal initialization tree" — Python Monorepo Zygote Tree Design

  • Teardown in reverse dependency order
  • Handle circular dependencies
  • Support yield fixtures correctly
  • Handle generator fixtures

Parametrized Fixtures

  • Support @pytest.fixture(params=[...])
  • Generate test variants for each param
  • Handle fixture param ids
  • Support indirect parametrization
  • Support fixture param marks

Fixture Visualization

  • Add --fixtures flag to show available fixtures
  • Add --fixture-graph to visualize dependencies

    Ref: "The Rust resolver calculates the module's fully qualified name based on its file path relative to the nearest init.py or namespace root" — Python Monorepo Zygote Tree Design

  • Show fixture scope and autouse status
  • Indicate where fixtures are defined
  • Export fixture graph as DOT/Mermaid

0.5.x - Developer Experience

Focus: Better error messages, debugging tools, and developer ergonomics.

Research Foundation: Integrates PEP 669 low-impact monitoring from Rust-CPython Execution Blueprint Research for observability.

  • "employs PEP 669 (Low-Impact Monitoring) to achieve observability with negligible overhead" — Rust-CPython Execution Blueprint Research
  • "the runner is a high-performance native binary—constructed in Rust—that acts as a hypervisor for the Python runtime" — Rust-CPython Execution Blueprint Research

The 0.5.x series focuses on making Tach a joy to use. Better error messages, powerful debugging tools, and smoother integration with development workflows.

0.5.0 - Enhanced Tracebacks

Target: pytest-quality error output.

Traceback Formatting

  • Implement pytest-style short tracebacks
  • Show only relevant frames (hide internal frames)
  • Highlight the assertion line
  • Support --tb=short, --tb=long, --tb=native
  • Support --tb=line for one-line summaries
  • Support --tb=no to disable tracebacks

Local Variable Display

  • Capture local variables at assertion failure

    Ref: "The evaluator inspects the fcode of the frame. It checks a high-performance Rust hash map to see if a mock has been registered" — _Python Testing Engine Rust Breakthroughs

  • Display variable values inline with traceback
  • Truncate large values intelligently
  • Support --showlocals flag
  • Color-code variable types

Assertion Introspection

  • Parse assertion expressions

    Ref: "The AST visitor walks the tree of a function. It serializes the nodes into a byte stream, deliberately excluding: Docstrings, Type hints, and Formatting" — Python Testing Engine Rust Breakthroughs

  • Show sub-expression values
  • Support comparison operators (==, !=, <, etc.)
  • Handle complex expressions (assert x in y)
  • Support assert with messages

Diff Display

  • Show diffs for string comparisons
  • Show diffs for dict comparisons
  • Show diffs for list comparisons
  • Color-code additions/deletions
  • Support unified diff format

0.5.1 - Debug Mode

Status: 🔨 IN PROGRESS

Target: Deep visibility into Tach internals.

Verbose Logging

  • --debug flag for detailed logging
  • Log syscall activity (userfaultfd, fork, etc.)

    Ref: "The userfaultfd subsystem fundamentally alters the contract between the memory management unit (MMU) and the user-space application" — Python Memory Snapshotting with Userfaultfd

  • Log worker lifecycle events
  • Log memory snapshot timing
  • Log IPC message flow

Worker Visualization

  • Show worker status in real-time
  • Display which test each worker is running
  • Show queue depth and scheduling decisions
  • Indicate safe vs toxic workers

    Ref: "The result is a binary classification for every module in the monorepo: Safe or Toxic" — Rust Static Analysis for Toxic Python Modules

  • Show worker memory usage

Performance Profiling

  • Measure time in discovery, execution, reporting
  • Show per-test timing breakdown
  • Identify slow fixture setup
  • Profile memory snapshot overhead

    Ref: "If a 1GB heap is snapshotted, but the subsequent execution only touches 50KB, only those 50KB are physically copied and mapped" — Python Memory Snapshotting with Userfaultfd

  • Generate flamegraphs

0.5.2 - Interactive Debugging

Target: Seamless debugger integration.

pdb Support

  • --pdb flag to drop into debugger on failure
  • Detect breakpoint() calls in tests
  • Disable worker isolation when debugging

    Ref: "The Supervisor sets the user's physical terminal to Raw Mode. It enters a loop where it reads bytes from the user's stdin and writes them directly to the worker's PTY master" — Project Tach Compatibility Layer Blueprint

  • Support --pdb-first for first failure only
  • Support custom debuggers (ipdb, pudb)

Post-Mortem Debugging

  • Capture exception state for post-mortem
  • Support pytest.set_trace() equivalent
  • Handle debugger in forked workers
  • Serialize debugger context if needed

IDE Integration

  • Document VS Code launch configurations
  • Document PyCharm run configurations
  • Support remote debugging
  • Handle debugger attach to workers
  • Support DAP (Debug Adapter Protocol)

0.5.3 - Output Customization

Target: Flexible output formatting.

Output Formats

  • Support --color=auto/always/never
  • Support --no-header for minimal output
  • Support --quiet for summary only
  • Support --verbose levels (-v, -vv, -vvv)
  • Support custom output templates

Progress Display

  • Support different progress styles (bar, dots, verbose)
  • Support --no-progress for CI
  • Show ETA for test completion
  • Show test rate (tests/second)

0.5.4 - Coverage Optimization

Target: Near-zero overhead coverage using SlipCover patterns.

Research Foundation: SlipCover achieves 5% overhead vs 218% for coverage.py via runtime de-instrumentation.

De-instrumentation Strategy

  • Implement line-level de-instrumentation after first execution

    Ref: "Periodically de-instrument covered lines. Overhead proportional to uncovered code" — SlipCover Paper

  • Branch de-instrumentation for already-covered branches
  • Hot-path detection to skip instrumentation entirely
  • Incremental coverage mode (only instrument changed files)

PEP 669 Integration

  • Use sys.monitoring.DISABLE return value for one-shot events

    Ref: "Events can be disabled after first firing" — PEP 669

  • Benchmark against coverage.py and SlipCover
  • Target: <5% overhead for typical test suites

External Refs:


0.6.x - Configuration

Focus: Complete configuration system with pyproject.toml support.

Research Foundation: Enables "Zero-Copy" module loading configuration from Zero-Copy Python Module Loading.

  • "architecture treats the Python interpreter not as a standalone application that discovers code, but as an embedded execution engine that is fed pre-validated code objects" — Zero-Copy Python Module Loading
  • "This approach effectively shifts the computational costs of I/O, parsing, and compilation from the critical path of the Python process startup to a pre-computation phase" — Zero-Copy Python Module Loading

The 0.6.x series implements a full configuration system. Currently Tach has limited configuration - this series adds comprehensive pyproject.toml support.

0.6.0 - pyproject.toml Schema

Target: Full configuration via pyproject.toml.

Schema Definition

  • Define complete [tool.tach] schema

    Ref: "The Rust supervisor must pre-calculate the dependency graph of the modules and load them in Topological Order" — Zero-Copy Python Module Loading

  • Document all configuration options
  • Provide JSON schema for IDE completion
  • Validate configuration on startup
  • Support schema versioning

Core Options

[tool.tach]
testpaths = ["tests"]
python_files = ["test_*.py", "*_test.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
norecursedirs = [".git", "node_modules", ".venv"]

Execution Options

[tool.tach.execution]
workers = "auto"  # or integer
timeout = 60
exitfirst = false
maxfail = 0

0.6.1 - Test Configuration

Target: Fine-grained test behavior configuration.

Per-Test Timeout

  • Support timeout in markers
  • Support timeout in config by pattern
  • Override global timeout per-test
  • Handle timeout inheritance

Directory-Specific Settings

  • Support tach.toml in subdirectories
  • Merge settings from parent directories
  • Override parent settings locally
  • Document precedence rules

Marker-Based Configuration

  • Configure behavior based on markers

    Ref: "The visitor flags a module as Tier 3 if it encounters: Network I/O, Concurrency, System Mutation, or Global Locks" — Python Monorepo Zygote Tree Design

  • Set default markers via config
  • Filter tests by marker expression
  • Support custom marker definitions

0.6.2 - Execution Configuration

Target: Control test execution behavior.

Test Ordering

  • Random order: --random-order
  • Dependency order: respect @pytest.mark.dependency
  • Duration order: fastest first

    Ref: "We profile packages and give more weight to those with slow module imports. We implement priority by replacing the 1's in the binary calls matrix with the weight values" — Forklift

  • Reverse order: --reverse
  • Alphabetical order

Environment Variables

  • Define env vars in config
  • Support env var files (.env)
  • Expand variables in values
  • Protect sensitive values
  • Support per-environment configs

Isolation Modes

  • Full isolation (default)

    Ref: "Namespaces provide complete, kernel-enforced isolation with acceptable overhead. Every syscall is isolated at kernel level" — Project Tach Compatibility Layer Blueprint

  • Relaxed isolation (faster, less safe)
  • No isolation (--no-isolation)
  • Per-test isolation override

0.6.3 - Configuration Profiles

Target: Support different configurations for different scenarios.

Profile System

  • Define named profiles in config
  • Switch profiles via --profile flag
  • Support profile inheritance
  • Document common profile patterns

Environment Detection

  • Auto-detect CI environment
  • Apply CI-specific defaults
  • Support environment-based profiles
  • Handle Docker/container detection

0.7.x - Performance

Focus: Memory optimization, adaptive scheduling, and parallelism improvements.

Research Foundation: Implements microsecond-scale memory reset using userfaultfd from Python Memory Snapshotting with Userfaultfd and Userfaultfd and CPython Allocator Interaction.

  • "By 'snapshotting' the virtual memory state of a process and lazily restoring it upon access, engineers can achieve reset times measured in microseconds rather than milliseconds" — Python Memory Snapshotting with Userfaultfd
  • "If a 1GB heap is snapshotted, but the subsequent execution only touches 50KB, only those 50KB are physically copied and mapped. This O(N) cost... is the primary driver of UFFD's performance advantage" — Python Memory Snapshotting with Userfaultfd
  • "leverages jemalloc's manual cache flushing capabilities to establish a stable, high-performance test runner" — Python Memory Snapshotting with Userfaultfd

The 0.7.x series focuses on performance at scale. As test suites grow to thousands of tests, we need smarter scheduling and better memory management.

0.7.0 - Memory Optimization

Target: Reduce memory footprint and improve snapshot efficiency.

Memory Profiling

  • Add --memory-profile flag
  • Track memory usage per test
  • Identify memory leaks
  • Report peak memory usage
  • Generate memory reports

Snapshot Optimization

  • Reduce snapshot size via compression
  • Implement incremental snapshots

    Ref: "The kernel iterates over the Page Table Entries corresponding to the address range. It clears the 'Present' bit, effectively unmapping the physical pages" — Python Memory Snapshotting with Userfaultfd

  • Skip unchanged memory regions
  • Use copy-on-write more effectively

    Ref: "workers inherit the parent's memory state without duplication, only copying physical pages when they are modified" — Cross-Platform Process Cloning Research

  • Optimize page table handling

Memory Pressure Handling

  • Detect low memory conditions
  • Reduce worker count under pressure
  • Trigger garbage collection proactively

    Ref: "If a snapshot is taken while the GC is traversing the object graph and modifying gcrefs, a subsequent restore will leave the GC in an inconsistent state" — _Userfaultfd and CPython Allocator Interaction

  • Fail gracefully on OOM
  • Support memory limits

0.7.1 - Adaptive Scheduling

Target: Smart test scheduling based on historical data.

Duration Prediction

  • Track test durations over time

    Ref: "The significant skew in package popularity indicates that relatively few zygotes could provide substantial benefit. The top 15 packages alone account for more than 50% of the files" — Forklift

  • Store duration data in cache file
  • Predict duration for new tests
  • Balance worker load based on predictions
  • Handle duration variance

Hot/Cold Classification

  • Identify frequently-run tests
  • Prioritize cold tests for early execution
  • Cache compilation for hot tests
  • Optimize discovery for hot paths

Load Balancing

  • Distribute tests evenly by predicted duration
  • Handle stragglers (tests slower than predicted)
  • Support test stealing between workers
  • Minimize total wall-clock time

0.7.2 - Lazy Loading

Target: Reduce startup time for large codebases.

Lazy Module Loading

  • Don't import modules until needed

    Ref: "To speed up restart, zygotes are created lazily upon first use. Zygotes may be evicted under memory pressure" — Forklift

  • Load test modules on-demand
  • Share loaded modules between workers
  • Support preloading via config

Import Graph Analysis

  • Build module dependency graph

    Ref: "Profiling data from large-scale deployments indicates that module initialization—specifically the parsing, compiling, and executing of top-level code in dependencies—accounts for 60% to 80% of cold start duration" — Python Monorepo Zygote Tree Design

  • Identify shared dependencies
  • Optimize import order
  • Detect circular imports

Deferred Compilation

  • Compile bytecode lazily
  • Cache compiled bytecode

    Ref: "The runner maintains a content-addressable store of compiled bytecode. When a file is modified, the runner invokes a compilation step to generate the binary blob for direct injection" — Rust-CPython Execution Blueprint Research

  • Use mmap for bytecode files
  • Share bytecode between workers

0.7.3 - Parallel Discovery

Target: Speed up test collection for large codebases.

Rayon Integration

  • Parallelize file scanning

    Ref: "Rust, utilizing the rayon data parallelism library, can saturate all CPU cores to parse and analyze thousands of files per second" — Rust Static Analysis for Toxic Python Modules

  • Parse test files in parallel
  • Merge discovery results efficiently
  • Handle discovery errors in parallel context

Incremental Discovery

  • Cache discovery results
  • Detect file changes via mtime/hash
  • Only re-discover changed files
  • Support --cache-clear to reset

Parser Evaluation

  • Benchmark rustpython-parser vs ruff_python_parser for test discovery

    ruff_python_parser: "capable of processing gigabytes of source code per second" — Rust-CPython Execution Blueprint

  • Evaluate error recovery characteristics (important for incomplete files)
  • Consider migration if >2x performance improvement observed
  • Document parser selection rationale

External Refs:

0.7.4 - Advanced Snapshot Techniques (Research)

Target: Investigate next-generation snapshot approaches from fuzzing research.

Kernel Module Investigation

  • Evaluate AFL-Snapshot-LKM approach for kernel-level snapshots

    Ref: AFL-Snapshot-LKM achieves 20-360% speedup over fork-server

  • Assess kernel module licensing and distribution implications
  • Prototype kernel-assisted snapshot/restore cycle
  • Benchmark against userfaultfd approach

LibAFL Integration Patterns

  • Study LibAFL snapshot executor architecture

    Ref: LibAFL Book documents Rust fuzzing patterns

  • Evaluate executor abstraction for Tach isolation modes
  • Consider shared memory arena patterns from fuzzing

Performance Targets

Technique Current Overhead Target Speedup vs Fork Implementation Complexity
Fork (baseline) ~500-1000 μs N/A 1x Low
Fork server ~100-200 μs 0.1.x ✓ 5x Low
userfaultfd ~10-50 μs 0.7.x 10-50x Medium
Kernel snapshot (LKM) ~1-5 μs Future 100-500x High (GPL)

Licensing Note: AFL-Snapshot-LKM is GPL-licensed. Distribution as kernel module has licensing implications for Tach's MIT license. Consider:

  • Optional separate download for kernel module
  • Benchmark-only usage (non-production)
  • Alternative: Investigate kernel API stabilization for mainline support

0.8.x - CI/CD Integration

Focus: First-class CI/CD support with templates and integrations.

Research Foundation: Enables future cross-platform support per Cross-Platform Process Cloning Research.

  • "By leveraging undocumented kernel primitives—Mach virtual memory remapping on macOS and NT process cloning on Windows—it is theoretically possible to approximate the performance of Linux fork()" — Cross-Platform Process Cloning Research
  • "The cornerstone of simulating Copy-on-Write on macOS without utilizing the standard fork() system call is machvm_remap" — _Cross-Platform Process Cloning Research

The 0.8.x series makes Tach a first-class citizen in CI/CD pipelines. Better reporting, CI platform integrations, and artifact handling.

0.8.0 - GitHub Actions

Target: Seamless GitHub Actions integration.

Workflow Templates

  • Basic workflow template
  • Matrix build template (multiple Python versions)
  • Coverage workflow template
  • Release workflow template
  • Caching workflow template

GitHub Integration

  • PR comment with test summary
  • Status check reporting
  • Annotation for test failures
  • Problem matcher for error highlighting
  • SARIF output for security findings

0.8.1 - Other CI Platforms

Target: Support for major CI platforms.

GitLab CI

  • .gitlab-ci.yml templates
  • GitLab JUnit integration
  • Coverage badge support
  • GitLab Pages for reports

Other Platforms

  • CircleCI orb
  • Jenkins pipeline library
  • Azure DevOps tasks
  • Travis CI examples
  • Buildkite plugin
  • Drone CI examples

0.8.2 - Reporting Improvements

Target: Better test result reporting.

JUnit XML Enhancements

  • Add test properties to JUnit XML
  • Support file attachments
  • Include timing information
  • Support test categories
  • Handle multi-file output

HTML Reports

  • Generate standalone HTML reports
  • Include failure details and tracebacks
  • Show test duration charts
  • Support filtering and search
  • Export as static site

Flaky Test Detection

  • Track test pass/fail history
  • Identify tests with inconsistent results

    Ref: "If the child process did not explicitly re-seed, both parent and child would generate identical sequences of 'random' numbers" — Fork Safety of Python C-Extensions

  • Report flakiness percentage
  • Suggest potential causes
  • Support auto-retry for flaky tests

0.8.3 - Coverage Reporting

Target: Complete coverage workflow.

Coverage Formats

  • Cobertura XML (default)
  • LCOV format
  • JSON format
  • HTML report
  • SonarQube format

Coverage Features

  • Coverage diff (new code only)
  • Coverage thresholds (fail if below)
  • Branch coverage

    Ref: "employs PEP 669 (Low-Impact Monitoring) to achieve observability with negligible overhead" — Rust-CPython Execution Blueprint Research

  • Missing lines report
  • Coverage trending

0.8.4 - Sub-Interpreter Workers (Experimental)

Target: Alternative worker model using PEP 684 per-interpreter GIL instead of fork.

Research Foundation: PEP 684 enables true parallel Python execution within a single process. PEP 734 (Python 3.14) exposes this via concurrent.interpreters.

  • "Each sub-interpreter can have its own GIL" — PEP 684
  • V8 isolates demonstrate 5ms startup (Cloudflare Workers model)

Sub-Interpreter Pool

  • Prototype sub-interpreter-based worker using C-API

    Py_NewInterpreterFromConfig with PyInterpreterConfig_OWN_GIL

  • Implement channel-based communication between interpreters

    No direct object sharing; use interpreters.Queue or shared memory

  • Benchmark against fork-based workers
  • Document extension module compatibility requirements

    Many C extensions don't support sub-interpreters yet

PEP 734 Integration (Python 3.14+)

  • Use concurrent.interpreters when available
  • Fallback to C-API for Python 3.12-3.13
  • Test with free-threaded Python builds

External Refs:


0.9.x - Stability

Focus: Production hardening, crash recovery, and resource management.

The 0.9.x series hardens Tach for production use. Crash recovery, resource cleanup, and stress testing ensure reliability.

0.9.0 - Crash Recovery

Target: Graceful handling of crashes and errors.

Process Cleanup

  • Detect and kill orphan workers
  • Clean up shared memory on crash
  • Handle SIGKILL correctly
  • Recover from supervisor crash
  • Clean up temp files

State Recovery

  • Save test progress periodically
  • Resume from last known state
  • Report partial results on crash
  • Support --resume flag
  • Handle interrupted runs

0.9.1 - Signal Handling

Target: Proper signal handling throughout.

Signal Support

  • SIGINT (Ctrl+C) - Graceful shutdown
  • SIGTERM - Clean exit
  • SIGHUP - Reload configuration
  • SIGQUIT - Dump stack traces
  • SIGUSR1 - Status dump

Child Signal Handling

  • Forward signals to workers
  • Handle worker signal death
  • Timeout on worker shutdown
  • Force kill unresponsive workers

0.9.2 - Resource Management

Target: Prevent resource leaks.

Leak Detection

  • Track file descriptor usage
  • Detect FD leaks
  • Track memory allocations
  • Detect memory leaks
  • Track thread creation

Resource Limits

  • Enforce FD limits per worker
  • Enforce memory limits
  • Enforce CPU time limits
  • Report resource violations
  • Support cgroups integration

0.9.3 - Stress Testing

Target: Verify stability under load.

Test Scenarios

  • Large test suites (10k+ tests)
  • Long-running tests (hours)
  • High parallelism (100+ workers)
  • Memory pressure scenarios
  • Network failure scenarios

0.10.x - Beta 1

Focus: Feature freeze and stabilization.

0.10.0 - Beta 1 Release

  • Feature freeze
  • API stability review
  • Complete documentation
  • Migration guide draft
  • Public beta announcement

0.10.1 - Beta 1 Fixes

  • Bug fixes from beta 1 feedback
  • Performance regression testing
  • Compatibility testing
  • Security audit

0.11.x - Beta 2

Focus: Final polish before 1.0.

0.11.0 - Beta 2 Release

  • Address beta 1 feedback
  • Final API changes
  • Documentation updates
  • Performance optimization

0.11.1 - Release Candidate 1

  • Final bug fixes
  • Release notes
  • Upgrade path testing
  • Community feedback

0.11.2 - Release Candidate 2

  • Critical fixes only
  • Final documentation
  • Package verification
  • Release preparation

1.0.0 - Production Ready

Stable release with API guarantees.

  • Complete user documentation
  • API stability commitment (SemVer)
  • Migration guide from pytest
  • Long-term support policy
  • Performance benchmarks published
  • Security best practices documented
  • Battle-tested on real-world projects

1.1.x - Post-1.0 Maintenance

Focus: Maintenance and minor improvements.

1.1.0 - First Maintenance Release

  • Bug fixes from 1.0.0 feedback
  • Minor performance improvements
  • Documentation updates
  • Dependency updates

1.1.1 - Patch Release

  • Critical bug fixes
  • Security patches

1.2.x - Post-1.0 Features

Focus: New features that didn't make 1.0.

1.2.0 - Feature Release

  • Features deferred from 1.0
  • Community-requested features
  • Plugin ecosystem improvements
  • Additional database support

External References

Consolidated external documentation and resources referenced throughout this roadmap.

Python Standards

Linux Kernel

Rust Libraries

Related Projects

Research Papers