Skip to content

fix: alt configs model deployment and training issues #4833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 14, 2024

Conversation

malav-shastri
Copy link
Collaborator

@malav-shastri malav-shastri commented Aug 14, 2024

Issue #, if available:

Description of changes:

  • Model Deployment and training failed for curatedHub customers last week for models with alt configs.
  • Reverting back on this commit fixed the issue locally so investigated this commit for fixing the issue ba7e6a9

Testing done:

  • Notebook testing
  • Unit tests

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@malav-shastri malav-shastri requested a review from a team as a code owner August 14, 2024 18:38
@@ -259,6 +259,7 @@ def _add_instance_type_to_kwargs(
sagemaker_session=kwargs.sagemaker_session,
model_type=kwargs.model_type,
config_name=kwargs.config_name,
hub_arn=kwargs.hub_arn,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to be finding these lines forever aren't we....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an optional field but looks like integ tests could be a possible solution here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we desperately need integ tests for (at least):

  1. ModelReference fine-tuning
  2. ModelReference deploy
  3. Alt config deploy

@@ -1278,7 +1278,7 @@ def from_json(self, json_obj: Dict[str, Any]) -> None:
json_obj (Dict[str, Any]): Dictionary representation of spec.
"""
if self._is_hub_content:
json_obj = walk_and_apply_json(json_obj, camel_to_snake)
json_obj = walk_and_apply_json(json_obj, camel_to_snake, ["metrics"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit- you can remove ["metrics"] since it is the default value

lambda metric_definition: metric_definition["Name"]
!= instance_specific_metric_name,
default_metric_definitions,
if instance_specific_metric_definitions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if not instance_specific_metric_definitions:
  return default_metric_definitions

@@ -10,16 +10,20 @@
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
# pylint: skip-file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: generally not a fan of skipping entire file, we should fix in a follow-up PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me and @bencrabtree just discuss this, there's one apparently pylint false positive which we are not able to get rid of. Skipping it to unblock customer but sure I can followup

new[new_key] = []
for item in value:
_walk_and_apply_json(item, new=new[new_key])
if (stop_keys and new_key not in stop_keys) or stop_keys is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if stop_keys and new_key in stop_keys:
    new[new_key] = value
else:
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, conditional is too complex. If you want to keep this condition, create a utils and unit-test separately please.

@pintaoz-aws pintaoz-aws merged commit 6df69e3 into aws:master Aug 14, 2024
10 of 12 checks passed


def camel_to_snake(camel_case_string: str) -> str:
"""Converts camelCaseString or UpperCamelCaseString to snake_case_string."""
snake_case_string = re.sub("(.)([A-Z][a-z]+)", r"\1_\2", camel_case_string)
if "-" in snake_case_string:
# remove any hyphen from the string for accurate conversion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary comment

def walk_and_apply_json(json_obj: Dict[Any, Any], apply) -> Dict[Any, Any]:
"""Recursively walks a json object and applies a given function to the keys."""
def walk_and_apply_json(
json_obj: Dict[Any, Any], apply, stop_keys: Optional[List[str]] = ["metrics"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: apply typing please

new[new_key] = []
for item in value:
_walk_and_apply_json(item, new=new[new_key])
if (stop_keys and new_key not in stop_keys) or stop_keys is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, conditional is too complex. If you want to keep this condition, create a utils and unit-test separately please.

for config_name in config_names
}
if metadata_configs
else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ternary + list-comprehension is a recipe for unreadability. Consider:

if metadata_configs:
  return { ... }
else:
  return {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants