Skip to content

Releases: pytorch/test-infra

v20250905-153412

05 Sep 15:35
12626a2
Compare
Choose a tag to compare
Upgrade scale up/down lambdas to aws sdk v3 (#7061)

Upgrade to aws sdk v3

Main change is getting rid of the `promise` calls since I think they
just directly return promises instead of requests

This changes a lot of mocks in the testing so I'm not sure how good
running just `yarn test` is

Testing:
Mangled scaleDown to only run `listInstances` and `listSSMParameters`
In `terraform-aws-github-runner/modules/runners/lambdas/runners`:
```
yarn build; cd dist;node -e 'require("./index").scaleDown({}, {}, {});' > t.log    
```
I also tried to terminate a runner and it worked


Deployed to pytorch-canary and ran some jobs, seems ok

v20250905-153356

05 Sep 15:35
a98113d
Compare
Choose a tag to compare
Upgrade webhook lambda for scale up/down to aws sdk v3 (#7077)

Similar to #7061 

Mostly just getting rid of `promise()`

Testing:
just `yarn test` but idk how helpful that is since it mocks everything
Deployed to pytorch-canary and it seems ok?

v20250905-153317

05 Sep 15:35
200cb0c
Compare
Choose a tag to compare
Upgrade runner-binaries-syncer to aws sdk v3 (#7078)

`yarn test` is broken on main, you can see this in
https://github.com/pytorch/test-infra/blob/a32b8f647ed2df0e93a167e518cf92f5855671ce/terraform-aws-github-runner/modules/runner-binaries-syncer/lambdas/runner-binaries-syncer/Makefile#L15


Testing:
Stole some environment variables from the lambda and mangled the key to
upload to a dummy key then ran `yarn build; cd dist;node -e
'require("./index").handler();' > t.log`
Saw that it uploaded a file, and skipped some because they didn't need
to be uploaded.
The one that was uploaded was arm64, which I'm thinking was uploaded
manually since it lacks a tag on s3.

I had to add an `await` since it wasn't working, which I think is a bug
in the original code

v20250902-213100

02 Sep 21:33
c1eb200
Compare
Choose a tag to compare
Bump tracing-subscriber from 0.3.18 to 0.3.20 in /aws/lambda/log-clas…

v20250902-173719

02 Sep 17:39
a32b8f6
Compare
Choose a tag to compare
[BE][EZ] Document what the enable_organizations_runner param does (#7…

v20250829-162418

29 Aug 16:26
eaebfc3
Compare
Choose a tag to compare
[autoscalers] Only use auth to download github files if needed (#7064)

This enables autoscalers to use a scale-config that's located in an
organization other than the one they're located in, as long as that
scale-config is located in a public repo (which all our scale configs
currently are).

Bug it fixes: The old code would create a github client to download the
scale-config.yml file, but `createGitHubClientForRunnerOrg` will fail if
you try to try to create a client for an org your app doesn't have
access to. Using a full blown git client for a public file also seems
unnecessary.

This version uses a normal http request to pull the raw file. So
authentication doesn't matter. (Aside: I considered keeping the old flow
as a backup path for if we ever want the scale config to live in a
public repo, but if and when that day comes I'd rather we add the logic
afresh than leave dead, unused code around in the script.)

Testing: Verified the getRunnerTypes functionality locally to ensure it
worked end-to-end without mocks.

---------

Co-authored-by: Jean Schmidt <[email protected]>

v20250828-135156

28 Aug 13:54
29d0529
Compare
Choose a tag to compare
Fixing the behaviour for getRunnerTypes with scaleConfigOrg (#7062)

Currently, when scaleConfigOrg is pointing to an organization that is
not the one runners are assigned to, it is not always correctly selected
for `getRunnerTypes` call.

Triggering errors similar to:

```
ERROR [getRunnerTypes]: HttpError: Not Found
```

This is due it not be correctly matched in all places where its usage is
called.

v20250826-210603

26 Aug 21:08
8a3e81d
Compare
Choose a tag to compare
[autorevert] refactoring: extract Signal, decouple pattern detection …

v20250822-013402

22 Aug 01:36
c60987b
Compare
Choose a tag to compare
[autorevert] fix query sorting (#7043)

Current sorting uses the workflow dispatch time, what does not match the
order for commit sequence. The correct approach is to sort by merge
timestamp for all workflows.

This was causing errors in the detection logic, as it was mixing the
order of jobs for commit evaluation, detecting rules where it should
not.

```
==================================================
SUMMARY STATISTICS
==================================================
Workflow(s): Lint, trunk, pull, inductor, linux-binary-manywheel
Timeframe: 4380 hours
Commits checked: 33873
Auto revert patterns detected: 560
Actual reverts inside auto revert patterns detected (%): 204 (36.4%)
Total revert commits in period: 601

Revert categories:
  nosignal: 215 (35.8%)
  ghfirst: 151 (25.1%)
  uncategorized: 105 (17.5%)
  ignoredsignal: 70 (11.6%)
  weird: 46 (7.7%)
  landrace: 14 (2.3%)

Total reverts excluding ghfirst: 450
Reverts (excluding ghfirst) that dont match any auto revert pattern detected (%): (268) (59.6%)

*********************************************************************
STATS SUMMARY:
 PRECISION: 36.4%
 RECALL: 33.9%
 F1: 35.1%
*********************************************************************

Per workflow precision:
  Lint: 50 reverts out of 60 patterns (83.3%) [excluding ghfirst: 46 (76.7%)]
  trunk: 40 reverts out of 74 patterns (54.1%) [excluding ghfirst: 37 (50.0%)]
  pull: 79 reverts out of 276 patterns (28.6%) [excluding ghfirst: 74 (26.8%)]
  inductor: 34 reverts out of 144 patterns (23.6%) [excluding ghfirst: 31 (21.5%)]
  linux-binary-manywheel: 1 reverts out of 6 patterns (16.7%) [excluding ghfirst: 0 (0.0%)]
  ```

v20250819-162243

19 Aug 16:24
4e34c26
Compare
Choose a tag to compare
Bump axios from 1.7.7 to 1.8.2 in /terraform-aws-github-runner/module…