-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
49bd078
to
bf55587
Compare
@@ -1157,6 +1161,15 @@ def optimize( | |||
Model: A deployable ``Model`` object. | |||
""" | |||
|
|||
# TODO: ideally these dictionaries need to be sagemaker_core shapes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Ideally validations should happen at the start of the flow, not after everything is set up in the wrapper. Lets fast follow this and gradually shift the validations to helper functions in the outer scope here
logger = logging.getLogger(__name__) | ||
|
||
|
||
class OptimizationContainer(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use these for internal organization. But we should not burden customers with knowing how our optimizations work, which containers are used under which scenarios, etc.
Consider the following error messages -
- Optimizations that use Compilation and Speculative Decoding are not currently supported on GPU & Neuron instances.
- Optimizations that use Compilation and Quantization are not currently supported on Neuron instances.
- Optimizations that use Quantization:SmoothQuant are only supported with Compilation on GPU instances.
- Sharding is mutually exclusive and only supported on GPU instances.
The general theme we want to maintain is
Optimizations {A,B,C} are {not|only} supported with optimizations {X,Y,Z} on instances {L,M,N}
bf55587
to
3d04384
Compare
f1d3649
to
57123c9
Compare
a4dba03
to
d1074eb
Compare
ccad6cd
to
76a4102
Compare
|
f999ace
to
06ed0e3
Compare
c9487bd
to
daa3bc0
Compare
daa3bc0
to
3e97708
Compare
9a779fe
to
7e8e237
Compare
changes for blackbird - model sharding add more tests fix sharded model flag add optimization validations fix formatting and msging fixing validation bugs add UTs simplify logic update messaging formatting fix UTs add more UTs fix validations update ruleset update formatting update validation logic update bug fixes Disable network isolation if using sharded models. check sharding + network iso pre optimization add more UTs for sharding add more UTs
29a2242
to
0707798
Compare
} | ||
|
||
|
||
def _validate_optimization_configuration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is such a big method, we should look to optimise this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to view the UTs here: https://github.com/aws/sagemaker-python-sdk/pull/4927/files#diff-5f58a16d071696e26bf4081f3f01294336f6c669407a9474ed7c0747af0e629fR3479. and Ack, we will look to optimize post reinvent
Issue #, if available:
Description of changes:
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.