Skip to content

feat: fixing instance to config auto resolution support #5251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tanvikab
Copy link

@tanvikab tanvikab commented Aug 5, 2025

feat: fixing instance to config auto resolution support

The JumpStart inference configuration auto-detection problem occurred because users had to manually specify the config_name parameter when deploying models to different instance types, leading to deployment failures when GPU configurations were incorrectly applied to Neuron instances or vice versa. The root cause was that four critical artifact retrieval functions (image_uris.py, model_uris.py, environment_variables.py, and resource_requirements.py) lacked auto-detection logic, causing the system to default to the first available configuration instead of selecting the appropriate one for the target instance type. I implemented a universal auto-detection algorithm that extracts the instance type family, discovers all available inference configurations, and uses the JumpStart ranking system to select the highest priority configuration that supports the instance type. This solution automatically routes ml.inf2.24xlarge instances to the neuron configuration and ml.g5.12xlarge instances to the tgi configuration, eliminating manual configuration requirements while maintaining backward compatibility. To verify the implementation, I created 19 comprehensive unit tests across two test files (test_auto_detection.pywith 13 tests and test_config_auto_detection.py with 6 tests) that cover image URI selection, model artifact retrieval, environment variables, resource requirements, ranking system priority, edge cases, and integration patterns - all tests pass, confirming that users can now deploy JumpStart models by specifying only the model_id and instance_type without needing knowledge of specific inference configurations.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)
  • If adding any dependency in requirements.txt files, I have spell checked and ensured they exist in PyPi

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tanvikab tanvikab requested a review from a team as a code owner August 5, 2025 17:38
@tanvikab tanvikab closed this Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant