Releases: pytorch/test-infra
Releases · pytorch/test-infra
v20260226-220511
[autorevert] expand signal extraction to new test failures (#7789) ## Summary Extends the job-track signal extraction to also detect **test-caused failures** as a separate pass, fixing two blind spots in autorevert detection: 1. **New tests with no green base**: When a commit introduces a new test that immediately fails, the test-track signal has no prior SUCCESS events, so autorevert never triggers (`NO_SUCCESSES`). The job-track naturally has a green base — the job passed on older commits before the test existed. 2. **Test failures missing from `tests.all_test_runs`**: Some test failures are classified at the job level but never appear in the test results table (e.g., ROCm gfx950 shards). The job-track maps these to SUCCESS (deferred to test-track), but test-track has nothing — both tracks miss the failure entirely. ## Approach `_build_non_test_signals()` now takes a `test_failures` boolean and is called twice: - `test_failures=False` (existing): only non-test failures → FAILURE - `test_failures=True` (new): only test-caused failures → FAILURE The relevance check is a single XOR: `test_failures \!= meta.has_non_test_failures`. Signals from the test-failure pass use a `[test]` key suffix to distinguish them from the non-test job signal. No changes to `signal.py` — the new signals reuse `SignalSource.JOB` and all existing pattern detection, infra checks, and confidence thresholds apply as-is. ## Test plan - Updated 2 existing tests to reflect new `[test]` signal emission - Added 3 new tests: - Green base scenario (job passes on older commits, test failure on newer) - Inverse mapping (non-test failures map to SUCCESS on `[test]` track) - No spurious signal (infra-only failures produce no `[test]` signal) - All 101 tests pass --- manual testing: ``` python -m pytorch_auto_revert --dry-run autorevert-checker trunk --hours 18 --hud-html --as-of "2026-02-24 19:08" ``` result: [2026-02-25T00-18-17.215587-00-00.html](https://github.com/user-attachments/files/25533370/2026-02-25T00-18-17.215587-00-00.html)
v20260226-005732
[autorevert] Treat test retry success as SUCCESS in autorevert signal…
v20260224-181703
Bump bytes to 1.11.1 in log-classifier (#7781) ## Summary - Bumps `bytes` minimum version from 1.2.1 to 1.11.1 in `aws/lambda/log-classifier/Cargo.toml` - Fixes GHSA-434x-w66g-qw3r (severity: MODERATE) - **Note:** `Cargo.lock` needs regeneration — run `cd aws/lambda/log-classifier && cargo update -p bytes` ## Deploy steps - Merge PR, then rebuild and redeploy the log-classifier Lambda ## Test steps - `cd aws/lambda/log-classifier && cargo build` — verify it compiles - `cd aws/lambda/log-classifier && cargo test` — run existing tests
v20260224-181643
Pin time >= 0.3.47 in log-classifier (#7782) ## Summary - Adds explicit `time = "0.3.47"` dep to force transitive resolution to >= 0.3.47 - Fixes GHSA-r6v5-fh4h-64xc (severity: MODERATE) - **Note:** `Cargo.lock` needs regeneration — run `cd aws/lambda/log-classifier && cargo update -p time` ## Deploy steps - Merge PR, then rebuild and redeploy the log-classifier Lambda ## Test steps - `cd aws/lambda/log-classifier && cargo build` — verify it compiles - `cd aws/lambda/log-classifier && cargo test` — run existing tests
v20260209-232704
Pin setuptools<82 to fix pkg_resources removal breakage (#7747) setuptools 82.0.0 (released Feb 8, 2026) removed pkg_resources, which breaks transitive dependencies that import it. Pin setuptools<82 in build-system requires and dev requirements. --------- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
v20260204-221009
Add token usage and model tracking to Claude billing (#7729) ## Summary - Adds token tracking fields to `misc.claude_code_usage` schema: - `input_tokens`, `output_tokens` - `cache_read_input_tokens`, `cache_creation_input_tokens` - `model` - Updates `upload-claude-usage` action to extract these fields from Claude output - Updates S3 replicator lambda to handle new fields - Creates v2 Grafana dashboard with token metrics ## Dashboard v2 https://pytorchci.grafana.net/public-dashboards/83058a8d65d44a099eb8d9ac2916f411 New metrics: - Total input/output tokens - Cache hit rate - Cost by model breakdown - Token usage by workflow - Daily cache performance ## Test plan - [ ] Deploy schema changes (ALTER TABLE to add columns) - [ ] Deploy lambda changes - [ ] Verify new data flows with token fields populated - [ ] Verify v2 dashboard shows token metrics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
v20260128-210015
set up PT x vLLM regression config (#7684)
Summary:
As title, this should connect regressions to the newly created GH issue.
Using 1.20 and 0.8 as thresholds.
Test Plan:
Local run with the following:
```
python aws/lambda/benchmark_regression_summary_report/lambda_function.py --clickhouse-endpoint ${CLICKHOUSE_ENDPOINT} --clickhouse-username ${DEV_USERNAME} --clickhouse-password ${CLICKHOUSE_PASSWORD} --config-id pytorch_x_vllm_benchmark
```
Ran both yesterday and today, and run today was sufficient to trigger
thresholds for regressions, so 20% seems appropriate here.
Reviewers:
Subscribers:
Tasks:
Tags:
v20260128-164443
[helion][Benchmark] Increase the speedup threshold based on request (…
v20260126-220349
[AUTOREVERT] Ask for claude bot to provide user guidance (#7692) known limitations - does not work with forked PRs, but anthropic [seems to be working](https://github.com/anthropics/claude-code-action/issues/821) on it but the output [looks nice](https://github.com/pytorch/pytorch/pull/173119#issuecomment-3801829468) Signed-off-by: Jean Schmidt <contato@jschmidt.me>
v20260122-234401
Add Claude Code usage metrics upload action and database schema (#7675) This pull request introduces a new pipeline for collecting and ingesting Claude Code usage metrics into ClickHouse for analytics. The changes span GitHub Actions, AWS Lambda ingestion logic, and database schema additions to support this new data flow. * reusable action to upload claude code metrics to s3 * clickhouse schema for the metrics * turns on ingestion