feat(tests): Add per-process test isolation and lean parallel runner#109994
Draft
feat(tests): Add per-process test isolation and lean parallel runner#109994
Conversation
Each pytest process now automatically gets isolated PostgreSQL databases, Redis DBs, and Kafka topics via a random hex worker ID — no configuration needed. This started as a way to let independent pytest sessions run simultaneously without interference, then we realized the same isolation machinery could replace pytest-xdist entirely with a much leaner implementation (~280 lines, zero new dependencies). The built-in `-n` flag distributes tests across worker subprocesses upfront, with results replayed through pytest's native reporting hooks. No execnet, no dynamic load balancing, no heavy master-worker IPC — just subprocess spawning and JSONL temp files. Key changes: - `xdist.py`: Per-worker identity via random hex (replaces file locks) - `parallel.py`: Lean xdist replacement with native pytest summary - `sentry.py`: Use xdist helpers for DB names, Redis DBs, Kafka topics - Remove `reset_snuba` fixture (no-op since ClickHouse isolation uses PostgreSQL sequence uniqueness — project IDs never collide) - Remove all pytest-xdist artifacts (PYTEST_XDIST_WORKER, looponfailroots) Co-Authored-By: Claude <noreply@anthropic.com>
Master renamed Region to Cell. Our branch had a partial merge leaving both constructors in the same assignment. Use Cell with the per-worker snowflake_id. Co-Authored-By: Claude <noreply@anthropic.com>
Use a dedicated _SENTRY_PARALLEL_NODEIDS env var for the parallel runner instead of overloading SELECTED_TESTS_FILE, which is meant for selective test execution in CI. Workers now receive exact node IDs, and the partition uses round-robin to preserve collection order. Also adds tests for the partition logic. Co-Authored-By: Claude <noreply@anthropic.com>
Only backend.yml retains its pull_request trigger so that this PR can run backend tests with -n2 without noise from unrelated workflows. Co-Authored-By: Claude <noreply@anthropic.com> (cherry picked from commit 0c01225)
Contributor
|
🚨 Warning: This pull request contains Frontend and Backend changes! It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently. Have questions? Please ask in the |
- Rename `xdist.py` → `isolation.py` to avoid confusion with pytest-xdist - Replace random hex worker IDs with file-lock slot allocation: fixes `--reuse-db` (stable DB names) and Redis DB collisions between sessions - Raise slot cap from 7 to 15 (Redis DBs 1–15, DB 0 reserved for dev) - Detect TTY via `tw.hasmarkup`: in-place `\r` progress on terminals, plain streaming dots on CI pipes (fixes GHA log spam) - Move parallel test execution docs from AGENTS.md to tests/AGENTS.md Co-Authored-By: Claude <noreply@anthropic.com>
Workers now call pytest.main(nodeids) directly via a worker shim script instead of collecting the full test suite and filtering post-collection. This eliminates ~36s of redundant collection per worker subprocess. - Add parallel_worker.py: standalone script that reads nodeids from a file and calls pytest.main() with them - Remove _SENTRY_PARALLEL_NODEIDS filter from pytest_collection_modifyitems - Coordinator passes non-positional args via _SENTRY_PARALLEL_ARGS env var Co-Authored-By: Claude <noreply@anthropic.com>
Contributor
Backend Test FailuresFailures on
|
The `reset_snuba` fixture was incorrectly removed as a "no-op" — it actually drops and recreates ClickHouse tables between tests, preventing cross-test data contamination. Restore it and all its usages. Fix slot-0 isolation to match historical defaults: - Redis DB base: 9 (was incorrectly 1) - DB suffix: "" for slot 0 (was "_0", breaking --reuse-db) - Kafka topics: base name for slot 0 (was suffixed) - Snowflake ID: 0 for slot 0 (was 1) - Max slots: 7 (Redis DBs 9-15, was 15 which exceeds Redis default) Fix ResourceWarning for unclosed lock file via atexit handler. Co-Authored-By: Claude <noreply@anthropic.com>
…t hack The parallel coordinator now correctly handles pytest-rerunfailures' "rerun" outcome: intermediate reruns don't inflate the progress counter, show as "R" dots (not "F"), and aren't printed in red. Add comment explaining why HAS_PYTEST_HANDLECRASHITEM is disabled. Co-Authored-By: Claude <noreply@anthropic.com>
Contributor
Backend Test FailuresFailures on
|
The coordinator's `run()` never set `session.testsfailed`, so pytest always exited 0 regardless of worker failures. Now we propagate the failure count from collected results, and also detect worker crashes (non-zero exit without JSONL reports) as a fallback. Co-Authored-By: Claude <noreply@anthropic.com>
When pytest-rerunfailures retries a test, multiple call-phase reports are emitted for the same nodeid (intermediate "rerun" + final outcome). The coordinator was appending all of them to term_reporter.stats, inflating the summary counts. Now we replace the previous report for a nodeid so only the final outcome appears in the summary. Co-Authored-By: Claude <noreply@anthropic.com>
Serial mode (SENTRY_PYTEST_SERIAL=1) uses the same DB, Redis DB, and Kafka topics as auto-allocated slot 0, but previously bypassed the file lock entirely. A concurrent auto-allocated session could claim slot 0 and collide. Now serial mode acquires slot 0's lock as best-effort, preventing overlap. Co-Authored-By: Claude <noreply@anthropic.com>
The old cap of 7 was constrained by Redis DBs 9-15. Now slot 0 stays at DB 9 (historical default), slots 1-8 use DBs 1-8, and slots 9-14 use DBs 10-15 — all 15 DBs except DB 0 (reserved for dev). This allows up to 14 parallel workers with `-n`. Co-Authored-By: Claude <noreply@anthropic.com>
Each `-n` invocation created a /tmp/sentry_parallel_* directory with nodeid lists and JSONL result files that was never cleaned up. Now removed after all results are collected. Co-Authored-By: Claude <noreply@anthropic.com>
…e cleanup Replace the per-test `reset_snuba` fixture (which dropped and recreated all ClickHouse tables before every test) with a session-scoped cleanup that runs once. Within a session, test isolation relies on unique snowflake IDs from PostgreSQL — each test gets fresh org/project IDs that never collide. Key changes: - `reset_snuba` fixture is now session-scoped and autouse. In parallel mode the coordinator resets ClickHouse before spawning workers; worker processes skip the fixture. - Redis `flushdb` in test teardown now preserves `snowflakeid:*` keys. Previously, flushing reset the snowflake sequence counter, and under `@freeze_time` (constant timestamp) subsequent tests regenerated identical snowflake IDs — causing ClickHouse data bleed between tests. - Removed all per-test `reset_snuba` references from SnubaTestCase, ProfilesSnubaTestCase, UptimeCheckSnubaTestCase, MetricsAPIBaseTestCase, ReplayEAPTestCase, and individual test files. - Fixed 15 dynamic_sampling tests that queried ClickHouse without org/project filtering (cross-org discovery queries). These now scope assertions to their own org IDs instead of expecting exact global counts. - Updated tests/AGENTS.md and isolation.py docstring to reflect the new ClickHouse isolation model. Co-Authored-By: Claude <noreply@anthropic.com>
Contributor
Backend Test FailuresFailures on
|
With session-scoped ClickHouse cleanup, data persists across tests within a session. Fix tests that assumed a clean ClickHouse state: - test_snowflake: Update Region→Cell after upstream rename - test_organization_events_histogram: Use dedicated project for outlier test so other tests' events don't affect detection - release_health tests: Scope assertions to test-owned orgs and mock discovery query to prevent IntegrityErrors from stale orgs - test_organization_replay_index: Add project filter to viewed replay query so other tests' replays don't contaminate results Co-Authored-By: Claude <noreply@anthropic.com>
Contributor
Backend Test FailuresFailures on
|
- Histogram test: use self.populate_events with temporary project swap instead of manual reimplementation that dropped events - Replay viewed test: assert by replay ID instead of response position to handle non-deterministic ClickHouse sort order - Exclude slot lock file from unclosed_files fixture to prevent flakes - Pipeline Redis commands in snowflake key preservation to reduce per-test overhead (~150 round trips → 3 per host) Co-Authored-By: Claude <noreply@anthropic.com>
Add release_slot() to close the lock fd and unlink the lock file. Called from pytest_sessionfinish so the slot is freed promptly instead of waiting for process exit / atexit. Co-Authored-By: Claude <noreply@anthropic.com>
Contributor
Backend Test FailuresFailures on
|
populate_events() uses deepcopy(self.data) which preserves the same span_id across all events. When events share the same (project_id, finish_ts, transaction_name, span_id) sorting key, ClickHouse's ReplacingMergeTree deduplicates them during background merges. This was hidden on master because reset_snuba recreated tables before each test, preventing merges. With session-scoped ClickHouse cleanup, merges happen and rows get deduplicated. Fix: generate a unique span_id per event in populate_events().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
todo:
Per-process test isolation and a lean built-in parallel runner that replaces pytest-xdist.
Motivation
This started as work to make independent
pytestsessions safe to run simultaneously — each process needs its own PostgreSQL databases, Redis DB, and Kafka topics so tests don't stomp on each other. Once that isolation layer existed, we realized we could reuse it to reimplementpytest -nin-house with a fraction of the complexity.pytest-xdist pulls in execnet, uses heavy master-worker IPC, and has a lot of machinery we don't need. Our replacement is ~280 lines with zero new dependencies — it just spawns subprocesses, writes JSONL result files, and replays results through pytest's native reporting hooks.
What changed
src/sentry/testutils/pytest/isolation.py(new) — Worker identity resolution. Each pytest process gets an isolated identity via:SENTRY_PYTEST_SERIAL=1→ no isolation (old behavior)SENTRY_TEST_WORKER_ID=N→ explicit slot (used by-n)pytestFile locks give stable DB names (
--reuse-dbworks) and exclusive Redis DBs (no cross-session collisions). Up to 15 parallel slots (Redis DBs 1–15).Provides helpers:
get_db_suffix(),get_redis_db(),get_kafka_topic(),get_snuba_url().src/sentry/testutils/pytest/parallel.py(new) — Lean xdist replacement.pytest -n4distributes tests across 4 workers upfront (partitioned by file, round-robin). The coordinator collects tests once, writes nodeids to temp files, and spawns worker subprocesses.src/sentry/testutils/pytest/parallel_worker.py(new) — Worker shim. Each worker reads nodeids from its temp file and callspytest.main(nodeids)directly — skipping the expensive collection walk entirely (~36s saved per worker). Results stream back via JSONL files.Output is TTY-aware: in-place dot progress with counter on terminals, plain streaming dots on CI pipes. Verbose (
-v) shows per-test[wN] nodeid OUTCOME (Xs)lines. Failures always print full tracebacks. Crashed workers dump their captured stdout.src/sentry/testutils/pytest/sentry.py— Uses isolation helpers for DB names, Redis DBs. Removedpytest_xdist_setupnodeshook andPYTEST_XDIST_TESTRUNUIDusage.Removed
reset_snubafixture — It was a no-op. ClickHouse isolation works via PostgreSQL sequence uniqueness: each test gets freshproject_ids that never collide, so rows from other tests are invisible without any truncation.Removed all pytest-xdist artifacts —
PYTEST_XDIST_WORKERenv var checks,looponfailrootsconfig, xdist dependency.Usage