- 
                Notifications
    
You must be signed in to change notification settings  - Fork 551
 
AWS Batch step operator #3954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            SebastianScherer88
  wants to merge
  39
  commits into
  zenml-io:develop
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
SebastianScherer88:feature/aws-step-operator
  
      
      
   
  
    
  
  
  
 
  
      
    base: develop
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
  
     Open
                    AWS Batch step operator #3954
Changes from 32 commits
      Commits
    
    
            Show all changes
          
          
            39 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      ce1de79
              
                Add version 0.84.3 to legacy docs (#3949)
              
              
                github-actions[bot] e87d336
              
                started creating required files and mapping out the zenml config -> a…
              
              
                SebastianScherer88 6b076d1
              
                finished first draft of aws batch step operator
              
              
                SebastianScherer88 c6f2a87
              
                renaming modules and adding unit tests
              
              
                SebastianScherer88 01017cc
              
                added support for multinode aws batch job type
              
              
                SebastianScherer88 b22672b
              
                added support for multinode aws batch job type
              
              
                SebastianScherer88 371a4ac
              
                adding test dependency back in and fixing typo in sagemaker doc string
              
              
                SebastianScherer88 a80f266
              
                renaming the aws batch runtime context retrieval utility
              
              
                SebastianScherer88 e372f85
              
                started creating required files and mapping out the zenml config -> a…
              
              
                SebastianScherer88 3d8c39b
              
                finished first draft of aws batch step operator
              
              
                SebastianScherer88 0543331
              
                renaming modules and adding unit tests
              
              
                SebastianScherer88 c9b5829
              
                added support for multinode aws batch job type
              
              
                SebastianScherer88 c787379
              
                added support for multinode aws batch job type
              
              
                SebastianScherer88 5fd0761
              
                adding test dependency back in and fixing typo in sagemaker doc string
              
              
                SebastianScherer88 5466799
              
                renaming the aws batch runtime context retrieval utility
              
              
                SebastianScherer88 17de12b
              
                bounding aws integration dependency boto3 < 2
              
              
                SebastianScherer88 1fcefac
              
                using immutable default dict factory instead of mutable empty dict value
              
              
                SebastianScherer88 eb6c320
              
                removing commented out default args
              
              
                SebastianScherer88 d1c002b
              
                removing incorrect warning stating that step level resources specific…
              
              
                SebastianScherer88 98e014e
              
                increased timeout error to 1h and added batch client error handling
              
              
                SebastianScherer88 1be5965
              
                replicated the sagemaker orchestrator aws authentication and session …
              
              
                SebastianScherer88 dab9340
              
                resolving merge conflicts
              
              
                SebastianScherer88 070ef62
              
                fixes off the back initial functional testing
              
              
                SebastianScherer88 a398139
              
                more changes after successfully e2e testing single node (i.e. aws bat…
              
              
                SebastianScherer88 1a602eb
              
                fixed step environment settings bug
              
              
                SebastianScherer88 69e60d1
              
                fixed the multinode targetnode syntax
              
              
                SebastianScherer88 0d53bce
              
                fixed type hints for instance type
              
              
                SebastianScherer88 739fdaa
              
                stripping out multinode support as its not really needed given batch …
              
              
                SebastianScherer88 9665107
              
                fixed fargate networking bug. the container spec model didnt have a n…
              
              
                SebastianScherer88 4e171c1
              
                default backend is fargate bc its faster and easier to set up the infra
              
              
                SebastianScherer88 02f9281
              
                fixed integration tests
              
              
                SebastianScherer88 778bdfd
              
                Merge branch 'develop' into feature/aws-step-operator
              
              
                SebastianScherer88 8fc2959
              
                addressed all comments except logging
              
              
                SebastianScherer88 835929c
              
                Merge branch 'feature/aws-step-operator' of https://github.com/Sebast…
              
              
                SebastianScherer88 6232f4f
              
                buffer of 5 chars
              
              
                SebastianScherer88 705c2a9
              
                added validation of pipeline and step name before assembling full job…
              
              
                SebastianScherer88 971cd68
              
                implemented name sanitization as suggested instead of raising excepti…
              
              
                SebastianScherer88 d2ace24
              
                added ec2 and fargate resource validation to schemas, simplified reso…
              
              
                SebastianScherer88 d3be040
              
                fixed bug in fargate resource memory validation range
              
              
                SebastianScherer88 File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
          Some comments aren't visible on the classic Files Changed page.
        
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
        
          
          
            201 changes: 201 additions & 0 deletions
          
          201 
        
  src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py
  
  
      
      
   
        
      
      
    
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| # Copyright (c) ZenML GmbH 2022. All Rights Reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at: | ||
| # | ||
| # https://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
| # or implied. See the License for the specific language governing | ||
| # permissions and limitations under the License. | ||
| """Amazon SageMaker step operator flavor.""" | ||
| 
     | 
||
| from typing import TYPE_CHECKING, Dict, Optional, Type, Literal | ||
| 
     | 
||
| from pydantic import Field, PositiveInt, field_validator | ||
| from zenml.utils.secret_utils import SecretField | ||
| from zenml.config.base_settings import BaseSettings | ||
| from zenml.integrations.aws import ( | ||
| AWS_RESOURCE_TYPE, | ||
| AWS_BATCH_STEP_OPERATOR_FLAVOR, | ||
| ) | ||
| from zenml.models import ServiceConnectorRequirements | ||
| from zenml.step_operators.base_step_operator import ( | ||
| BaseStepOperatorConfig, | ||
| BaseStepOperatorFlavor, | ||
| ) | ||
| 
     | 
||
| if TYPE_CHECKING: | ||
| from zenml.integrations.aws.step_operators import AWSBatchStepOperator | ||
| 
     | 
||
| 
     | 
||
| class AWSBatchStepOperatorSettings(BaseSettings): | ||
| """Settings for the Sagemaker step operator.""" | ||
| 
     | 
||
| environment: Dict[str, str] = Field( | ||
| default_factory=dict, | ||
| description="Environment variables to pass to the container during " \ | ||
| "execution. Example: {'LOG_LEVEL': 'INFO', 'DEBUG_MODE': 'False'}", | ||
| ) | ||
| job_queue_name: str = Field( | ||
| default="", | ||
| description="The AWS Batch job queue to submit the step AWS Batch job" | ||
| " to. If not provided, falls back to the default job queue name " | ||
| "specified at stack registration time. Must be compatible with" | ||
| "`backend`." | ||
| ) | ||
| backend: Literal['EC2','FARGATE'] = Field( | ||
| default="FARGATE", | ||
| description="The AWS Batch platform capability for the step AWS Batch " | ||
| "job to be orchestrated with. Must be compatible with `job_queue_name`." | ||
| "Defaults to 'FARGATE'." | ||
| ) | ||
| assign_public_ip: Literal['ENABLED','DISABLED'] = Field( | ||
| default="ENABLED", | ||
| description="Sets the network configuration's assignPublicIp field." | ||
| "Only relevant for FARGATE backend." | ||
| ) | ||
| timeout_seconds: PositiveInt = Field( | ||
| default=3600, | ||
| description="The number of seconds before AWS Batch times out the job." | ||
| ) | ||
| 
     | 
||
| 
     | 
||
| 
     | 
||
| class AWSBatchStepOperatorConfig( | ||
| BaseStepOperatorConfig, AWSBatchStepOperatorSettings | ||
| ): | ||
| """Config for the AWS Batch step operator. | ||
| Note: We use ECS as a backend (not EKS), and EC2 as a compute engine (not | ||
| Fargate). This is because | ||
| - users can avoid the complexity of setting up an EKS cluster, and | ||
| - we can AWS Batch multinode type job support later, which requires EC2 | ||
| """ | ||
| 
     | 
||
| execution_role: str = Field( | ||
| description="The IAM role arn of the ECS execution role." | ||
| ) | ||
| job_role: str = Field( | ||
| description="The IAM role arn of the ECS job role." | ||
| ) | ||
| default_job_queue_name: str = Field( | ||
| description="The default AWS Batch job queue to submit AWS Batch jobs to." | ||
| ) | ||
| aws_access_key_id: Optional[str] = SecretField( | ||
| default=None, | ||
| description="The AWS access key ID to use to authenticate to AWS. " | ||
| "If not provided, the value from the default AWS config will be used.", | ||
| ) | ||
| aws_secret_access_key: Optional[str] = SecretField( | ||
| default=None, | ||
| description="The AWS secret access key to use to authenticate to AWS. " | ||
| "If not provided, the value from the default AWS config will be used.", | ||
| ) | ||
| aws_profile: Optional[str] = Field( | ||
| None, | ||
| description="The AWS profile to use for authentication if not using " | ||
| "service connectors or explicit credentials. If not provided, the " | ||
| "default profile will be used.", | ||
| ) | ||
| aws_auth_role_arn: Optional[str] = Field( | ||
| None, | ||
| description="The ARN of an intermediate IAM role to assume when " | ||
| "authenticating to AWS.", | ||
| ) | ||
| region: Optional[str] = Field( | ||
| None, | ||
| description="The AWS region where the processing job will be run. " | ||
| "If not provided, the value from the default AWS config will be used.", | ||
| ) | ||
| 
     | 
||
| @property | ||
| def is_remote(self) -> bool: | ||
| """Checks if this stack component is running remotely. | ||
| This designation is used to determine if the stack component can be | ||
| used with a local ZenML database or if it requires a remote ZenML | ||
| server. | ||
| Returns: | ||
| True if this config is for a remote component, False otherwise. | ||
| """ | ||
| return True | ||
| 
     | 
||
| 
     | 
||
| class AWSBatchStepOperatorFlavor(BaseStepOperatorFlavor): | ||
| """Flavor for the AWS Batch step operator.""" | ||
| 
     | 
||
| @property | ||
| def name(self) -> str: | ||
| """Name of the flavor. | ||
| Returns: | ||
| The name of the flavor. | ||
| """ | ||
| return AWS_BATCH_STEP_OPERATOR_FLAVOR | ||
| 
     | 
||
| @property | ||
| def service_connector_requirements( | ||
| self, | ||
| ) -> Optional[ServiceConnectorRequirements]: | ||
| """Service connector resource requirements for service connectors. | ||
| Specifies resource requirements that are used to filter the available | ||
| service connector types that are compatible with this flavor. | ||
| Returns: | ||
| Requirements for compatible service connectors, if a service | ||
| connector is required for this flavor. | ||
| """ | ||
| return ServiceConnectorRequirements(resource_type=AWS_RESOURCE_TYPE) | ||
| 
     | 
||
| @property | ||
| def docs_url(self) -> Optional[str]: | ||
| """A url to point at docs explaining this flavor. | ||
| Returns: | ||
| A flavor docs url. | ||
| """ | ||
| return self.generate_default_docs_url() | ||
| 
     | 
||
| @property | ||
| def sdk_docs_url(self) -> Optional[str]: | ||
| """A url to point at SDK docs explaining this flavor. | ||
| Returns: | ||
| A flavor SDK docs url. | ||
| """ | ||
| return self.generate_default_sdk_docs_url() | ||
| 
     | 
||
| @property | ||
| def logo_url(self) -> str: | ||
| """A url to represent the flavor in the dashboard. | ||
| Returns: | ||
| The flavor logo. | ||
| """ | ||
| return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/step_operator/aws_batch.png" | ||
                
      
                  schustmi marked this conversation as resolved.
               
          
            Show resolved
            Hide resolved
         | 
||
| 
     | 
||
| @property | ||
| def config_class(self) -> Type[AWSBatchStepOperatorConfig]: | ||
| """Returns BatchStepOperatorConfig config class. | ||
| Returns: | ||
| The config class. | ||
| """ | ||
| return AWSBatchStepOperatorConfig | ||
| 
     | 
||
| @property | ||
| def implementation_class(self) -> Type["AWSBatchStepOperator"]: | ||
| """Implementation class. | ||
| Returns: | ||
| The implementation class. | ||
| """ | ||
| from zenml.integrations.aws.step_operators import AWSBatchStepOperator | ||
| 
     | 
||
| return AWSBatchStepOperator | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.