Skip to content

Releases: pytorch/test-infra

v20260226-220511

26 Feb 22:07
53bb4ce

Choose a tag to compare

[autorevert] expand signal extraction to new test failures (#7789)

## Summary

Extends the job-track signal extraction to also detect **test-caused
failures** as a separate pass, fixing two blind spots in autorevert
detection:

1. **New tests with no green base**: When a commit introduces a new test
that immediately fails, the test-track signal has no prior SUCCESS
events, so autorevert never triggers (`NO_SUCCESSES`). The job-track
naturally has a green base — the job passed on older commits before the
test existed.

2. **Test failures missing from `tests.all_test_runs`**: Some test
failures are classified at the job level but never appear in the test
results table (e.g., ROCm gfx950 shards). The job-track maps these to
SUCCESS (deferred to test-track), but test-track has nothing — both
tracks miss the failure entirely.

## Approach

`_build_non_test_signals()` now takes a `test_failures` boolean and is
called twice:
- `test_failures=False` (existing): only non-test failures → FAILURE
- `test_failures=True` (new): only test-caused failures → FAILURE

The relevance check is a single XOR: `test_failures \!=
meta.has_non_test_failures`. Signals from the test-failure pass use a
`[test]` key suffix to distinguish them from the non-test job signal.

No changes to `signal.py` — the new signals reuse `SignalSource.JOB` and
all existing pattern detection, infra checks, and confidence thresholds
apply as-is.

## Test plan

- Updated 2 existing tests to reflect new `[test]` signal emission
- Added 3 new tests:
- Green base scenario (job passes on older commits, test failure on
newer)
  - Inverse mapping (non-test failures map to SUCCESS on `[test]` track)
  - No spurious signal (infra-only failures produce no `[test]` signal)
- All 101 tests pass

---

manual testing:

```
python -m pytorch_auto_revert  --dry-run autorevert-checker trunk --hours 18 --hud-html --as-of "2026-02-24 19:08"
```
result: 

[2026-02-25T00-18-17.215587-00-00.html](https://github.com/user-attachments/files/25533370/2026-02-25T00-18-17.215587-00-00.html)

v20260226-005732

26 Feb 00:59
4fbd836

Choose a tag to compare

[autorevert] Treat test retry success as SUCCESS in autorevert signal…

v20260224-181703

24 Feb 18:18
ae82210

Choose a tag to compare

Bump bytes to 1.11.1 in log-classifier (#7781)

## Summary
- Bumps `bytes` minimum version from 1.2.1 to 1.11.1 in
`aws/lambda/log-classifier/Cargo.toml`
- Fixes GHSA-434x-w66g-qw3r (severity: MODERATE)
- **Note:** `Cargo.lock` needs regeneration — run `cd
aws/lambda/log-classifier && cargo update -p bytes`

## Deploy steps
- Merge PR, then rebuild and redeploy the log-classifier Lambda

## Test steps
- `cd aws/lambda/log-classifier && cargo build` — verify it compiles
- `cd aws/lambda/log-classifier && cargo test` — run existing tests

v20260224-181643

24 Feb 18:18
ed501e4

Choose a tag to compare

Pin time >= 0.3.47 in log-classifier (#7782)

## Summary
- Adds explicit `time = "0.3.47"` dep to force transitive resolution to
>= 0.3.47
- Fixes GHSA-r6v5-fh4h-64xc (severity: MODERATE)
- **Note:** `Cargo.lock` needs regeneration — run `cd
aws/lambda/log-classifier && cargo update -p time`

## Deploy steps
- Merge PR, then rebuild and redeploy the log-classifier Lambda

## Test steps
- `cd aws/lambda/log-classifier && cargo build` — verify it compiles
- `cd aws/lambda/log-classifier && cargo test` — run existing tests

v20260209-232704

09 Feb 23:28
a79eb7c

Choose a tag to compare

Pin setuptools<82 to fix pkg_resources removal breakage (#7747)

setuptools 82.0.0 (released Feb 8, 2026) removed pkg_resources, which
breaks transitive dependencies that import it. Pin setuptools<82 in
build-system requires and dev requirements.

---------

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

v20260204-221009

04 Feb 22:11
9b2fc27

Choose a tag to compare

Add token usage and model tracking to Claude billing (#7729)

## Summary
- Adds token tracking fields to `misc.claude_code_usage` schema:
  - `input_tokens`, `output_tokens`
  - `cache_read_input_tokens`, `cache_creation_input_tokens`
  - `model`
- Updates `upload-claude-usage` action to extract these fields from
Claude output
- Updates S3 replicator lambda to handle new fields
- Creates v2 Grafana dashboard with token metrics

## Dashboard v2

https://pytorchci.grafana.net/public-dashboards/83058a8d65d44a099eb8d9ac2916f411

New metrics:
- Total input/output tokens
- Cache hit rate
- Cost by model breakdown
- Token usage by workflow
- Daily cache performance

## Test plan
- [ ] Deploy schema changes (ALTER TABLE to add columns)
- [ ] Deploy lambda changes
- [ ] Verify new data flows with token fields populated
- [ ] Verify v2 dashboard shows token metrics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

v20260128-210015

28 Jan 21:02
165c8ed

Choose a tag to compare

set up PT x vLLM regression config (#7684)

Summary:
As title, this should connect regressions to the newly created GH issue.
Using 1.20 and 0.8 as thresholds.

Test Plan:
Local run with the following:
```
python aws/lambda/benchmark_regression_summary_report/lambda_function.py --clickhouse-endpoint ${CLICKHOUSE_ENDPOINT} --clickhouse-username ${DEV_USERNAME} --clickhouse-password ${CLICKHOUSE_PASSWORD} --config-id pytorch_x_vllm_benchmark
```
Ran both yesterday and today, and run today was sufficient to trigger
thresholds for regressions, so 20% seems appropriate here.

Reviewers:

Subscribers:

Tasks:

Tags:

v20260128-164443

28 Jan 16:46
6232f61

Choose a tag to compare

[helion][Benchmark] Increase the speedup threshold based on request (…

v20260126-220349

26 Jan 22:05
86c7370

Choose a tag to compare

[AUTOREVERT] Ask for claude bot to provide user guidance (#7692)

known limitations - does not work with forked PRs, but anthropic [seems
to be
working](https://github.com/anthropics/claude-code-action/issues/821) on
it

but the output [looks
nice](https://github.com/pytorch/pytorch/pull/173119#issuecomment-3801829468)

Signed-off-by: Jean Schmidt <contato@jschmidt.me>

v20260122-234401

22 Jan 23:45
b3cea9a

Choose a tag to compare

Add Claude Code usage metrics upload action and database schema (#7675)

This pull request introduces a new pipeline for collecting and ingesting
Claude Code usage metrics into ClickHouse for analytics. The changes
span GitHub Actions, AWS Lambda ingestion logic, and database schema
additions to support this new data flow.


* reusable action to upload claude code metrics to s3
* clickhouse schema for the metrics
* turns on ingestion