-
Notifications
You must be signed in to change notification settings - Fork 105
Add aws-actions authentication to linux_job_v2.yml #7115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@akashveramd is attempting to deploy a commit to the Meta Open Source Team on Vercel. A member of the Team first needs to authorize it. |
Unfortunately, rolling this would be hard work because all callers of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it fails lots of tests, isn't it?
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
Hi @malfet, it seems you approved the request. But I closed the PR thinking I have found a work around where we don't need to change linux_job_v2.yaml. However, I think I was wrong about it. Having aws-setup as a part of linux_job_v2.yaml looks like an only option. Can you please re-approve the changes? |
@akashveramd I didn't approve it, I've requested changes as it's break all existing CI workflows |
@malfet: I think those failures are happening because the workflow is running from a forked PR and the permission to gha_workflow_s3_and_ecr_read_only is not granted to it. I had a similar issue in my torchtitan PR where I was trying to do aws authentication inside the workflow. @huydhn pointed to this issue in the slack channel discussion. As a short-term workaround, Huy gave me write permission to the pytorch/torchtitan repo and I migrated my torchtitan PR directly inside pytorch/torchtitan, instead of my fork. With that aws issues disappeared. |
@huydhn do you mind validating those then? |
Um, no, my comment here #7115 (comment) still stands. This workflow does have You might have fixed it on torchtitan, but other users of |
@huydhn: Instead of modifying this workflow, I can create a new reusable workflow for ROCm with aws authentication. Something like linux_job_v2_rocm.yml |
…s authentication. Rolled back changes made to linux_job_v2.yml.
fdc2593
to
f7ee220
Compare
Yes, that sounds reasonable |
The latest updates on your projects. Learn more about Vercel for GitHub. |
A suggestion here is to add a test workflow like https://github.com/pytorch/test-infra/blob/main/.github/workflows/test_linux_job_v2.yml for this new one. It's hard to avoid failures otherwise |
@huydhn: Sorry, are you suggesting that I should create something like .github/workflows/test_linux_job_v2_rocm.yml and that will run for linux_job_v2_rocm.yml? Similar to how .github/workflows/test_linux_job_v2.yml runs for linux_job_v2.yml. |
Yup, exactly, you can call this new workflows |
Some runners do not have AWS CLI installed. Hence, in PyTorch we rely on github aws-actions instead of the CLI for authentication. To provide support for workflow files using linux_job_v2.yml, added aws-actions authentication to linux_job_v2.yml.
It is needed to get torchtitan PR going pytorch/torchtitan#1260. It currently faces issue with https://github.com/pytorch/torchtitan/actions/runs/16353043263/job/46204468985?pr=1260#step:8:221