Skip to content

Conversation

@guangy10
Copy link
Contributor

What shows here (#5428 (comment)) seems to be a common issue especially for people who are exporting the Diff to GitHub from Phabricator. Added an error message when secret is empty.

@guangy10 guangy10 requested a review from huydhn September 17, 2024 21:48
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5441

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 5830c88 with merge base 06c0fa3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 17, 2024
@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@guangy10
Copy link
Contributor Author

@huydhn Interesting that we already increased the timeout to 120min but the job still timed out after 60min: https://github.com/pytorch/executorch/actions/runs/10908226772/job/30282356192.
It looks like the job can take timeout param correctly: https://github.com/pytorch/test-infra/blob/main/.github/workflows/mobile_job.yml#L99

@facebook-github-bot
Copy link
Contributor

@guangy10 merged this pull request in 3e1a578.

@huydhn
Copy link
Contributor

huydhn commented Sep 18, 2024

@huydhn Interesting that we already increased the timeout to 120min but the job still timed out after 60min: https://github.com/pytorch/executorch/actions/runs/10908226772/job/30282356192. It looks like the job can take timeout param correctly: https://github.com/pytorch/test-infra/blob/main/.github/workflows/mobile_job.yml#L99

Good catch, it turns out that there is another "hidden" timeout there when the AWS credential expires after 1 hour. I have the fix here pytorch/test-infra#5685 to set it correcly.

huydhn added a commit to pytorch/test-infra that referenced this pull request Sep 18, 2024
As reported by @guangy10 in
pytorch/executorch#5441 (comment),
it turns out that the job timeout can be set to a higher value (120
minutes), but the AWS credential still expires after 1 hour
https://github.com/aws-actions/configure-aws-credentials.

The fix here is to set the role duration correctly. However, the max
that it can go is 18000 seconds. I believe that's plenty of room
already, but if a higher is needed, we will need to update the server
side
https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/gha_roles.tf#L1425

### Testing

The role duration is set correctly to 7200 seconds (2 hours)
https://github.com/pytorch/test-infra/actions/runs/10916800466/job/30298898748#step:4:4
huydhn added a commit to huydhn/test-infra that referenced this pull request Sep 24, 2024
As reported by @guangy10 in
pytorch/executorch#5441 (comment),
it turns out that the job timeout can be set to a higher value (120
minutes), but the AWS credential still expires after 1 hour
https://github.com/aws-actions/configure-aws-credentials.

The fix here is to set the role duration correctly. However, the max
that it can go is 18000 seconds. I believe that's plenty of room
already, but if a higher is needed, we will need to update the server
side
https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/gha_roles.tf#L1425

### Testing

The role duration is set correctly to 7200 seconds (2 hours)
https://github.com/pytorch/test-infra/actions/runs/10916800466/job/30298898748#step:4:4
kit1980 pushed a commit to pytorch/test-infra that referenced this pull request Sep 24, 2024
As reported by @guangy10 in

pytorch/executorch#5441 (comment),
it turns out that the job timeout can be set to a higher value (120
minutes), but the AWS credential still expires after 1 hour
https://github.com/aws-actions/configure-aws-credentials.

The fix here is to set the role duration correctly. However, the max
that it can go is 18000 seconds. I believe that's plenty of room
already, but if a higher is needed, we will need to update the server
side

https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/gha_roles.tf#L1425

### Testing

The role duration is set correctly to 7200 seconds (2 hours)
https://github.com/pytorch/test-infra/actions/runs/10916800466/job/30298898748#step:4:4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants