Skip to content

Conversation

@chucklever
Copy link
Contributor

Bring the new AWS dynamic Kconfig menu generation more in line with the Lambda Labs reference implementation:

  • Add PyTorch AMI options
  • Improve the performance of the generation scripts
  • Improve the selection of AMIs to add to the menu

This improvement ensures that:
- Terraform patterns reflect current AMI naming conventions
- Deprecated or outdated AMIs don't influence pattern generation
- The generated patterns match actively maintained, security-patched images
- The system gracefully handles distributions that release infrequently

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
The update removes older AMIs such as RHEL 7 and Fedora 38.

Signed-off-by: Chuck Lever <[email protected]>
Generating the patterns that terraform uses to perform AMI name
searches is probably the most challenging part of gen_kconfig_ami.
This patch makes some improvements in order to avoid providing the
same search pattern for multiple OS release choices.

1. Added OS identifier detection: The function now looks for
   parenthetical content like "(Ubuntu 20.04)" or "(Amazon Linux 2)"
   that distinguishes AMI variants
2. Preserves distinguishing characteristics: When OS identifiers are
   present, they're preserved in the pattern while version numbers
   and dates are wildcarded
3. Maintains backward compatibility: Distributions without OS
   identifiers (Amazon Linux, Debian, RedHat, etc.) continue to use
   simple prefix-based patterns

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
Enable selection of AMIs that include the Deep Learning GPU
packages.

Signed-off-by: Chuck Lever <[email protected]>
Regenerated with terraform/aws/scripts/gen_kconfig_ami.

Signed-off-by: Chuck Lever <[email protected]>
Following the Lambda Labs pattern, separate AWS Kconfig files into:
- Static wrapper files tracked in git (Kconfig.ami, Kconfig.instance, Kconfig.location)
- Dynamically generated files ignored by git (*.generated)

This prevents users from having to commit generated content after running
'make cloud-config-aws' while still providing working defaults via the
static wrapper files.

The static files source their .generated counterparts, which are created
by running 'make cloud-config-aws' and contain current AWS API data.

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
Create aws_common.py to consolidate common functionality used across
AWS Kconfig generation scripts (gen_kconfig_location, gen_kconfig_instance,
gen_kconfig_ami).

The new aws_common.py module provides:
- get_default_region(): Read AWS region from ~/.aws/config
- get_jinja2_environment(): Standardized Jinja2 template environment
- create_ec2_client(): Create boto3 EC2 clients with region override
- handle_aws_client_error(): Consistent AWS error handling
- handle_aws_credentials_error(): Consistent credentials error handling
- get_all_regions(): Retrieve all AWS regions
- get_region_availability_zones(): Get AZs for a specific region
- get_all_instance_types(): Get all instance types in a region
- get_region_kconfig_name(): Convert region names to Kconfig format

This refactoring eliminates code duplication among the scripts.

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
The gen_kconfig_location script was querying AWS for availability zone
information for each region (20-30 regions) sequentially. Each region
query required a separate AWS API call to describe_availability_zones,
resulting in 5-15 seconds of serial API calls.

This change parallelizes the availability zone discovery using
ThreadPoolExecutor, allowing all regions to be queried concurrently.
This reduces the total execution time from the sum of all requests to
approximately the time of the slowest single request, providing a
10-20x speedup for first-time runs.

The output order is preserved by collecting results in a dictionary and
printing them in the original region order.

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
The gen_kconfig_ami script was querying AWS for AMI information for each
Linux distribution (Amazon, Ubuntu, RedHat, Debian, etc.) sequentially.
With 10 different distributions, this resulted in 20-30 seconds of serial
API calls.

This change parallelizes the AMI discovery using ThreadPoolExecutor,
allowing all 10 distributions to be queried concurrently. This reduces
the total execution time from the sum of all requests to approximately
the time of the slowest single request, providing an 8-10x speedup for
first-time runs.

The output order is preserved by collecting results in a dictionary and
printing them in the original owner key order.

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
Claude was struggling a bit with how to validate the whitespace
in .j2 files used to generate Kconfig menus. We stumbled on this
method: generate a sample menu file and check that.

Generated-by: Claude AI
Signed-off-by: Chuck Lever <[email protected]>
@chucklever chucklever merged commit a35c99e into main Oct 27, 2025
22 checks passed
@chucklever chucklever deleted the cel/aws-improvements branch October 27, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants