- 
                Notifications
    You must be signed in to change notification settings 
- Fork 107
Set the AWS credential duration in mobile job #5685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
 | 
        
          
                .github/workflows/mobile_job.yml
              
                Outdated
          
        
      | run: | | ||
| set -ex | ||
| TIMEOUT_IN_SECONDS=$(( TIMEOUT_IN_MINUTES * 60 )) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The role duration probably should be somewhat longer than the bare timeout. Say 1h more than the timeout. But I wouldn't go with a complex solution here, instead just asking for the longest duration you can ask. I don't see any downsides of doing this, and it is less prone to failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, let me just stick the max duration here to keep it simple
As reported by @guangy10 in pytorch/executorch#5441 (comment), it turns out that the job timeout can be set to a higher value (120 minutes), but the AWS credential still expires after 1 hour https://github.com/aws-actions/configure-aws-credentials. The fix here is to set the role duration correctly. However, the max that it can go is 18000 seconds. I believe that's plenty of room already, but if a higher is needed, we will need to update the server side https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/gha_roles.tf#L1425 ### Testing The role duration is set correctly to 7200 seconds (2 hours) https://github.com/pytorch/test-infra/actions/runs/10916800466/job/30298898748#step:4:4
As reported by @guangy10 in pytorch/executorch#5441 (comment), it turns out that the job timeout can be set to a higher value (120 minutes), but the AWS credential still expires after 1 hour https://github.com/aws-actions/configure-aws-credentials. The fix here is to set the role duration correctly. However, the max that it can go is 18000 seconds. I believe that's plenty of room already, but if a higher is needed, we will need to update the server side https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/gha_roles.tf#L1425 ### Testing The role duration is set correctly to 7200 seconds (2 hours) https://github.com/pytorch/test-infra/actions/runs/10916800466/job/30298898748#step:4:4
As reported by @guangy10 in pytorch/executorch#5441 (comment), it turns out that the job timeout can be set to a higher value (120 minutes), but the AWS credential still expires after 1 hour https://github.com/aws-actions/configure-aws-credentials.
The fix here is to set the role duration correctly. However, the max that it can go is 18000 seconds. I believe that's plenty of room already, but if a higher is needed, we will need to update the server side https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/gha_roles.tf#L1425
Testing
The role duration is set correctly to 7200 seconds (2 hours) https://github.com/pytorch/test-infra/actions/runs/10916800466/job/30298898748#step:4:4