Skip to content

Conversation

@mollyheamazon
Copy link
Contributor

Issue #, if available:

Description of changes:

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)
  • If adding any dependency in requirements.txt files, I have spell checked and ensured they exist in PyPi

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

ci and others added 30 commits December 17, 2024 19:30
* Updating Inference Optimization Validations

* Linting
…4965)

* fix: security update -> use sha256 instead of md5 for file hashing

* fix: security update -> use sha256 instead of md5 for file hashing

* fix flake8

* fix: test spacing

---------

Co-authored-by: Brock Wade <[email protected]>
* add jumpstart ap-southeast-5

* add more image accounts

---------

Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: parknate@ <[email protected]>
* Disable jumpstart tests missing clean up logic

* Black format

---------

Co-authored-by: adishaa <[email protected]>
Co-authored-by: parknate@ <[email protected]>
* Fix hub model reference arn enum bug

* Add unit test for construct hub model reference arn util

* fix broken unit test

* formatting: add extra newline after unit test

* fix broken unit test

* fix formatting

* add more newlines around test

* codestyle: fix line too long

* Revert "codestyle: fix line too long"

This reverts commit 0b6867a.

* fix test

* add missing quote

---------

Co-authored-by: parknate@ <[email protected]>
…ot decoding the request again if it is not already bytes or bytestream (#4987)
* implemented multi-node distribution with @Remote function

* completed unit tests

* added distributed training with CPU and torchrun

* backwards compatibility nproc_per_node

* fixing code: permissions for non-root users, integration tests

* fixed docstyle

* refactor nproc_per_node for backwards compatibility

* refactor nproc_per_node for backwards compatibility

* pylint fix, newlines

* added unit tests for bootstrap_environment remote
* Fix Flake8 Violations

* Update omegaconf version to be compatible with python 3.11
* fix: Add missing attributes to local resourceconfig

* format fix

* add missing for local processing

* format fix
* fix: skip TF tests for unsupported versions

* flake8
* feat: add pytorch-tgi-inference 2.4.0

* add tgi 3.0.1 image

* skip faulty test

* formatting

* formatting

* add hf pytorch training 4.46

* update version alias

* add py311 to training version

* update tests with pyversion 311

* formatting

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
viclzhu and others added 26 commits March 27, 2025 12:56
Integ test failure is align with CI health
* fix integ test hub

* lint

* fix jumpstart curated hub bugs

* lint

* fix tests

* linting

* lint

* rm test file

* fix test

* fix

* lint

* remove test

* update for test
* Fix issue #4856 by copying environment variables
* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* Fix deepdiff dependencies

* trigger tests
* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* fix: tgi image uri unit tests

* fix: black-format and flake8 failures

* fix: parse

* fix: print statement

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
…#5123)

* clean up

* bump maxdepth for doc/api/training to fix readthedocs

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page
* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

* Revert PR 5122 changes, due to issues with other processor codeflows

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
@mollyheamazon mollyheamazon requested a review from a team as a code owner April 22, 2025 00:09
@mollyheamazon mollyheamazon requested a review from benieric April 22, 2025 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.