Skip to content

fix: make govcloud builds possible#2621

Merged
fletcherw merged 1 commit intoawslabs:mainfrom
fletcherw:build_credentials
Feb 10, 2026
Merged

fix: make govcloud builds possible#2621
fletcherw merged 1 commit intoawslabs:mainfrom
fletcherw:build_credentials

Conversation

@fletcherw
Copy link
Copy Markdown
Contributor

@fletcherw fletcherw commented Feb 10, 2026

Make various fixes so that govcloud builds work:

  • Make latest-binaries.sh handle fetching cross-partition
  • Change s3 copy logic to not check the AWS_ACCESS_KEY_ID variable, which is a fundamentally flawed check, and instead just attempt a copy using the environment's credentials and fall back to --no-sign-request if the initial copy fails.
  • update the cache-pause-container script to use the eks-distro public mirror

The net effect of this change is that by default, none of the provisioners rely on the caller's AWS credentials. I'm leaving in the logic that passes through the AWS credentials via environment variables since removing is a breaking change, but none of them are needed.

In a future major release, it would be nice to remove those entirely and require people to use instance roles if they need permissions in the provisioners (e.g. for private artifact buckets).

Fixes: #527, #762, #1772

Testing:
Builds in us-gov-west-1 and us-gov-east-1 with AWS credential env vars unset.
Build in us-gov-west-1 with AWS credential env vars set.
k8s 1.29 and 1.35 builds in us-west-2 with AWS credential env vars unset.

@fletcherw
Copy link
Copy Markdown
Contributor Author

/ci

@github-actions
Copy link
Copy Markdown
Contributor

@fletcherw roger that! I've dispatched a workflow. 👍

REGION="${2}"

CROSS_PARTITION_ARGS=""
if [[ "${REGION}" == us-gov* || "${REGION}" == eusc* ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this to be more tied to connectivity. if it's not an isolated partition or China, we know we can allow the cross-partition

but I recognize this is a hack script and it's not so easy to make this determination here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but I think it's okay like this in the hack script.

MINOR_VERSION="${1}"
REGION="${2}"

CROSS_PARTITION_ARGS=""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this name seems awkward in the invocation when it's not cross-partition, maybe something more generic like S3_CLI_EXTRA_ARGS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"nvidia_repository_url": null,
"nvidia_grid_runfile_bucket_name": "ec2-linux-nvidia-drivers",
"pause_container_image": "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10",
"pause_container_image": "public.ecr.aws/eks-distro/kubernetes/pause:3.10",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we should fix the caching of this image - it's a different auth token and subcommand for ecr-public

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@github-actions
Copy link
Copy Markdown
Contributor

@fletcherw the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.29 / al2023success ✅success ✅
1.30 / al2023success ✅success ✅
1.31 / al2023success ✅success ✅
1.32 / al2023success ✅success ✅
1.33 / al2023success ✅failure ❌
1.34 / al2023success ✅success ✅
1.35 / al2023success ✅success ✅

@fletcherw
Copy link
Copy Markdown
Contributor Author

1.33 test failure was just a timeout.

Make various fixes so that govcloud builds work:

* Make latest-binaries.sh handle fetching cross-partition
* Change s3 copy logic to not check the AWS_ACCESS_KEY_ID variable,
  which is a fundamentally flawed check, and instead just attempt a copy
  using the environment's credentials and fall back to --no-sign-request
  if the initial copy fails.
* update the cache-pause-container script to use the eks-distro public mirror

The net effect of this change is that by default, none of the
provisioners rely on the caller's AWS credentials. I'm leaving in the
logic that passes through the AWS credentials via environment variables
since removing is a breaking change, but none of them are needed.

In a future major release, it would be nice to remove those entirely and
require people to use instance roles if they need permissions from the
provisioners (e.g. for private artifact buckets).
@bradwatsonaws
Copy link
Copy Markdown

@fletcherw is the real MVP!

@fletcherw
Copy link
Copy Markdown
Contributor Author

Thanks @bradwatsonaws ! Please let me know if you have any issues with the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWS_ACCESS_KEY_ID environment variable issue

3 participants