generated from amazon-archives/__template_Apache-2.0
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 70
Initial commit for RAG in py-ml #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            hmumtazz
  wants to merge
  45
  commits into
  opensearch-project:main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
hmumtazz:rag_pipeline
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
  
     Open
                    Changes from 42 commits
      Commits
    
    
            Show all changes
          
          
            45 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      dbbd4f5
              
                Initial commit for RAG pipeline scripts
              
              
                hmumtazz dc94feb
              
                Added Licence Header and fixed .gitingore file
              
              
                hmumtazz cba567c
              
                Added comments to understand code
              
              
                hmumtazz 3481c3e
              
                Remove sensitive config file
              
              
                hmumtazz e6eee30
              
                Simplify the selection process for the ef_construction parameter by o…
              
              
                hmumtazz 8b7aa0e
              
                Allows Customer to register model via CLI, fixed embedding generation…
              
              
                hmumtazz 2635751
              
                Sign-off on all previous work
              
              
                hmumtazz 88cf68b
              
                Enhance RAG pipeline functionality and user experience
              
              
                hmumtazz ad1ea97
              
                Created Ingest Pipline for chunking.- Merge custom setup.py with the …
              
              
                hmumtazz da0ce7c
              
                Removed hard coded LLM model, allowed for Opensoure integration, user…
              
              
                hmumtazz fc2ace8
              
                Remove requirements.txt and setup.py from Git tracking
              
              
                hmumtazz fe83b25
              
                Update setup.py file
              
              
                hmumtazz bea75e0
              
                Remove .gitignore file from rag pipeline directory
              
              
                hmumtazz f5b74ed
              
                Organized Model registration into classes
              
              
                hmumtazz da834d2
              
                Remove base_model.py from rag pipeline
              
              
                hmumtazz 9c45e43
              
                Licence header
              
              
                hmumtazz a61a840
              
                Updated user setup, Added Semantic search for Managed service using M…
              
              
                hmumtazz 8db5e9f
              
                fixed failing test in AIConnector tests
              
              
                hmumtazz aa42922
              
                Update test_SecretsHelper
              
              
                hmumtazz 5984c19
              
                fixed license header test_SageMakerModel.py
              
              
                hmumtazz 4503a55
              
                fixed license header rag.py
              
              
                hmumtazz b1b98ab
              
                Added seperate class for embedding generation, removed nominee text p…
              
              
                hmumtazz d7e4713
              
                Correctly leveraged existing methods in AI connector class, without h…
              
              
                hmumtazz c02fe1b
              
                Removed duplicate method, and deleted unused method
              
              
                hmumtazz d877bca
              
                Fixed chunking pipeline, query was not generating due to mismatchd ve…
              
              
                hmumtazz f0194ec
              
                Remove config.ini from repository and add to .gitignore
              
              
                hmumtazz d354b25
              
                Update rag_setup.py to remove neural search line and serverless
              
              
                hmumtazz 1ca4221
              
                Updated query.py to reflect search changes
              
              
                hmumtazz 8b97791
              
                Missing innit file
              
              
                hmumtazz fe6f8c5
              
                missing init file
              
              
                hmumtazz 34f7fd4
              
                Fixed domain ARN/domain name error, changed directory, for IAM, Secre…
              
              
                hmumtazz 39a1883
              
                Updated AIConnectorHelper to use existing methods, like get task, and…
              
              
                hmumtazz 2dfa609
              
                Updated License Headers
              
              
                hmumtazz 2c40ce0
              
                Fixed UT, Ran Lint nox, fixed code and unused enviroment variable iss…
              
              
                hmumtazz 6a846ce
              
                Deleted serverless.py from rag functionality as we are not using it
              
              
                hmumtazz 6f17670
              
                deleted unused dependcies, and test.py for serverless
              
              
                hmumtazz 120169f
              
                Fixed failing test
              
              
                hmumtazz 91c9af0
              
                Fixed tests
              
              
                hmumtazz 82e69f5
              
                Fixed tests
              
              
                hmumtazz c0114c3
              
                Addressed comments like making code less redudent, and combining methods
              
              
                hmumtazz 197a134
              
                Addressed code comments, making code less redundent, and combining ex…
              
              
                hmumtazz 220e9ee
              
                Fixed methods at UT's, adressed comments
              
              
                hmumtazz ce56ce7
              
                Fixed IAM Roled Helper arguement Issue
              
              
                hmumtazz def3410
              
                Fixed failing lint test
              
              
                hmumtazz a500a56
              
                Fixed sign off
              
              
                hmumtazz File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,276 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # The OpenSearch Contributors require contributions made to | ||
| # this file be licensed under the Apache-2.0 license or a | ||
| # compatible open source license. | ||
| # Any modifications Copyright OpenSearch Contributors. See | ||
| # GitHub history for details. | ||
|  | ||
| import json | ||
| import logging | ||
| import uuid | ||
| from datetime import datetime | ||
|  | ||
| import boto3 | ||
| from botocore.exceptions import ClientError | ||
|  | ||
|  | ||
| class IAMRoleHelper: | ||
| """ | ||
| Helper class for managing IAM roles and their interactions with OpenSearch. | ||
| """ | ||
|  | ||
| def __init__( | ||
| self, | ||
| region, | ||
| opensearch_domain_url=None, | ||
| opensearch_domain_username=None, | ||
| opensearch_domain_password=None, | ||
| ): | ||
| """ | ||
| Initialize the IAMRoleHelper with AWS and OpenSearch configurations. | ||
|  | ||
| :param region: AWS region. | ||
| :param opensearch_domain_url: URL of the OpenSearch domain. | ||
| :param opensearch_domain_username: Username for OpenSearch domain authentication. | ||
| :param opensearch_domain_password: Password for OpenSearch domain authentication. | ||
| """ | ||
| self.region = region | ||
| self.opensearch_domain_url = opensearch_domain_url | ||
| self.opensearch_domain_username = opensearch_domain_username | ||
| self.opensearch_domain_password = opensearch_domain_password | ||
|  | ||
| self.iam_client = boto3.client("iam") | ||
| self.sts_client = boto3.client("sts", region_name=self.region) | ||
|  | ||
| def role_exists(self, role_name): | ||
| """ | ||
| Check if an IAM role exists. | ||
|  | ||
| :param role_name: Name of the IAM role. | ||
| :return: True if the role exists, False otherwise. | ||
| """ | ||
| try: | ||
| self.iam_client.get_role(RoleName=role_name) | ||
| return True | ||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"The requested role '{role_name}' does not exist.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
| return False | ||
|  | ||
| def delete_role(self, role_name): | ||
| """ | ||
| Delete an IAM role along with its attached policies. | ||
|  | ||
| :param role_name: Name of the IAM role to delete. | ||
| """ | ||
| try: | ||
| # Detach any managed policies from the role | ||
| policies = self.iam_client.list_attached_role_policies(RoleName=role_name)[ | ||
| "AttachedPolicies" | ||
| ] | ||
| for policy in policies: | ||
| self.iam_client.detach_role_policy( | ||
| RoleName=role_name, PolicyArn=policy["PolicyArn"] | ||
| ) | ||
| print(f"All managed policies detached from role '{role_name}'.") | ||
|  | ||
| # Delete inline policies associated with the role | ||
| inline_policies = self.iam_client.list_role_policies(RoleName=role_name)[ | ||
| "PolicyNames" | ||
| ] | ||
| for policy_name in inline_policies: | ||
| self.iam_client.delete_role_policy( | ||
| RoleName=role_name, PolicyName=policy_name | ||
| ) | ||
| print(f"All inline policies deleted from role '{role_name}'.") | ||
|  | ||
| # Finally, delete the IAM role | ||
| self.iam_client.delete_role(RoleName=role_name) | ||
| print(f"Role '{role_name}' deleted.") | ||
|  | ||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"Role '{role_name}' does not exist.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
|  | ||
| def create_iam_role( | ||
| self, | ||
| role_name, | ||
| trust_policy_json, | ||
| inline_policy_json, | ||
| policy_name=None, | ||
| ): | ||
| """ | ||
| Create a new IAM role with specified trust and inline policies. | ||
|  | ||
| :param role_name: Name of the IAM role to create. | ||
| :param trust_policy_json: Trust policy document in JSON format. | ||
| :param inline_policy_json: Inline policy document in JSON format. | ||
| :param policy_name: Optional. If not provided, a unique one will be generated. | ||
| :return: ARN of the created role or None if creation failed. | ||
| """ | ||
| try: | ||
| # Create the role with the provided trust policy | ||
| create_role_response = self.iam_client.create_role( | ||
| RoleName=role_name, | ||
| AssumeRolePolicyDocument=json.dumps(trust_policy_json), | ||
| Description="Role with custom trust and inline policies", | ||
| ) | ||
|  | ||
| # Retrieve the ARN of the newly created role | ||
| role_arn = create_role_response["Role"]["Arn"] | ||
|  | ||
| # If policy_name is not provided, generate a unique one | ||
| if not policy_name: | ||
| timestamp = datetime.utcnow().strftime("%Y%m%d%H%M%S") | ||
| policy_name = f"InlinePolicy-{role_name}-{timestamp}" | ||
|  | ||
| # Attach the inline policy to the role | ||
| self.iam_client.put_role_policy( | ||
| RoleName=role_name, | ||
| PolicyName=policy_name, | ||
| PolicyDocument=json.dumps(inline_policy_json), | ||
| ) | ||
|  | ||
| print(f"Created role: {role_name} with inline policy: {policy_name}") | ||
| return role_arn | ||
|  | ||
| except ClientError as e: | ||
| print(f"Error creating the role: {e}") | ||
| return None | ||
|  | ||
| def get_role_info(self, role_name, include_details=False): | ||
| """ | ||
| Retrieve information about an IAM role. | ||
|  | ||
| :param role_name: Name of the IAM role. | ||
| :param include_details: If False, returns only the role's ARN. | ||
| If True, returns a dictionary with full role details. | ||
| :return: ARN or dict of role details. Returns None if not found. | ||
| """ | ||
| if not role_name: | ||
| return None | ||
|  | ||
| try: | ||
| response = self.iam_client.get_role(RoleName=role_name) | ||
| role = response["Role"] | ||
| role_arn = role["Arn"] | ||
|  | ||
| if not include_details: | ||
| return role_arn | ||
|  | ||
| # Build a detailed dictionary | ||
| role_details = { | ||
| "RoleName": role["RoleName"], | ||
| "RoleId": role["RoleId"], | ||
| "Arn": role_arn, | ||
| "CreationDate": role["CreateDate"], | ||
| "AssumeRolePolicyDocument": role["AssumeRolePolicyDocument"], | ||
| "InlinePolicies": {}, | ||
| } | ||
|  | ||
| # List and retrieve any inline policies | ||
| list_role_policies_response = self.iam_client.list_role_policies( | ||
| RoleName=role_name | ||
| ) | ||
| for policy_name in list_role_policies_response["PolicyNames"]: | ||
| get_role_policy_response = self.iam_client.get_role_policy( | ||
| RoleName=role_name, PolicyName=policy_name | ||
| ) | ||
| role_details["InlinePolicies"][policy_name] = get_role_policy_response[ | ||
| "PolicyDocument" | ||
| ] | ||
|  | ||
| return role_details | ||
|  | ||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"Role '{role_name}' does not exist.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
| return None | ||
|  | ||
| def get_user_arn(self, username): | ||
| """ | ||
| Retrieve the ARN of an IAM user. | ||
|  | ||
| :param username: Name of the IAM user. | ||
| :return: ARN of the user or None if not found. | ||
| """ | ||
| if not username: | ||
| return None | ||
| try: | ||
| response = self.iam_client.get_user(UserName=username) | ||
| return response["User"]["Arn"] | ||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"IAM user '{username}' not found.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
| return None | ||
|  | ||
| def assume_role(self, role_arn, role_session_name=None, session=None): | ||
| """ | ||
| Assume an IAM role and obtain temporary security credentials. | ||
|  | ||
| :param role_arn: ARN of the IAM role to assume. | ||
| :param role_session_name: Identifier for the assumed role session. | ||
| :param session: Optional boto3 session object. Defaults to the class-level sts_client. | ||
| :return: Dictionary with temporary security credentials and metadata, or None on failure. | ||
| """ | ||
| if not role_arn: | ||
| logging.error("Role ARN is required.") | ||
| return None | ||
|  | ||
| sts_client = session.client("sts") if session else self.sts_client | ||
|  | ||
| role_session_name = role_session_name or f"session-{uuid.uuid4()}" | ||
|  | ||
| try: | ||
| assumed_role_object = sts_client.assume_role( | ||
| RoleArn=role_arn, | ||
| RoleSessionName=role_session_name, | ||
| ) | ||
|  | ||
| temp_credentials = assumed_role_object["Credentials"] | ||
| expiration = temp_credentials["Expiration"] | ||
|  | ||
| logging.info( | ||
| f"Assumed role: {role_arn}. Temporary credentials valid until: {expiration}" | ||
| ) | ||
|  | ||
| return { | ||
| "credentials": { | ||
| "AccessKeyId": temp_credentials["AccessKeyId"], | ||
| "SecretAccessKey": temp_credentials["SecretAccessKey"], | ||
| "SessionToken": temp_credentials["SessionToken"], | ||
| }, | ||
| "expiration": expiration, | ||
| "session_name": role_session_name, | ||
| } | ||
|  | ||
| except ClientError as e: | ||
| error_code = e.response["Error"]["Code"] | ||
| logging.error(f"Error assuming role {role_arn}: {error_code} - {e}") | ||
| return None | ||
|  | ||
| def get_iam_user_name_from_arn(self, iam_principal_arn): | ||
|         
                  hmumtazz marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| """ | ||
| Extract the IAM user name from an IAM principal ARN. | ||
|  | ||
| :param iam_principal_arn: ARN of the IAM principal. Expected format: arn:aws:iam::<account-id>:user/<user-name> | ||
| :return: IAM user name if extraction is successful, None otherwise. | ||
| """ | ||
| try: | ||
| if ( | ||
| iam_principal_arn | ||
| and iam_principal_arn.startswith("arn:aws:iam::") | ||
| and ":user/" in iam_principal_arn | ||
| ): | ||
| return iam_principal_arn.split(":user/")[-1] | ||
| except Exception as e: | ||
| print(f"Error extracting IAM user name: {e}") | ||
| return None | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # The OpenSearch Contributors require contributions made to | ||
| # this file be licensed under the Apache-2.0 license or a | ||
| # compatible open source license. | ||
| # Any modifications Copyright OpenSearch Contributors. See | ||
| # GitHub history for details. | ||
|  | ||
| import json | ||
| import logging | ||
|  | ||
| import boto3 | ||
| from botocore.exceptions import ClientError | ||
|  | ||
| # Configure the logger for this module | ||
| logger = logging.getLogger(__name__) | ||
|  | ||
|  | ||
| class SecretHelper: | ||
| """ | ||
| Helper class for managing secrets in AWS Secrets Manager. | ||
| Provides methods to check existence, retrieve details, and create secrets. | ||
| """ | ||
|  | ||
| def __init__(self, region: str): | ||
| """ | ||
| Initialize the SecretHelper with the specified AWS region. | ||
| :param region: AWS region where the Secrets Manager is located. | ||
| """ | ||
| self.region = region | ||
| # Create the Secrets Manager client once at the class level | ||
| self.secretsmanager = boto3.client("secretsmanager", region_name=self.region) | ||
|  | ||
| def secret_exists(self, secret_name: str) -> bool: | ||
| """ | ||
| Check if a secret with the given name exists in AWS Secrets Manager. | ||
| :param secret_name: Name of the secret to check. | ||
| :return: True if the secret exists, False otherwise. | ||
| """ | ||
| try: | ||
| # Attempt to retrieve the secret value | ||
| self.secretsmanager.get_secret_value(SecretId=secret_name) | ||
| return True | ||
| except ClientError as e: | ||
| # If the secret does not exist, return False | ||
| if e.response["Error"]["Code"] == "ResourceNotFoundException": | ||
| return False | ||
| else: | ||
| # Log other client errors and return False | ||
| logger.error(f"An error occurred: {e}") | ||
| return False | ||
|  | ||
| def get_secret_details(self, secret_name: str, fetch_value: bool = False) -> dict: | ||
| """ | ||
| Retrieve details of a secret from AWS Secrets Manager. | ||
| Optionally fetch the secret value as well. | ||
|  | ||
| :param secret_name: Name of the secret. | ||
| :param fetch_value: Whether to also fetch the secret value (default is False). | ||
| :return: A dictionary with secret details (ARN and optionally the secret value) | ||
| or an error dictionary if something went wrong. | ||
| """ | ||
| try: | ||
| # Describe the secret to get its ARN and metadata | ||
| describe_response = self.secretsmanager.describe_secret( | ||
| SecretId=secret_name | ||
| ) | ||
|  | ||
| secret_details = { | ||
| "ARN": describe_response["ARN"], | ||
| # You can add more fields from `describe_response` if needed | ||
| } | ||
|  | ||
| # Fetch the secret value if requested | ||
| if fetch_value: | ||
| value_response = self.secretsmanager.get_secret_value( | ||
| SecretId=secret_name | ||
| ) | ||
| secret_details["SecretValue"] = value_response.get("SecretString") | ||
|  | ||
| return secret_details | ||
|  | ||
| except ClientError as e: | ||
| error_code = e.response["Error"]["Code"] | ||
| if error_code == "ResourceNotFoundException": | ||
| logger.warning(f"The requested secret '{secret_name}' was not found") | ||
| else: | ||
| logger.error( | ||
| f"An error occurred while fetching secret '{secret_name}': {e}" | ||
| ) | ||
| # Return a dictionary with error details | ||
| return {"error": str(e), "error_code": error_code} | ||
|  | ||
| def create_secret(self, secret_name: str, secret_value: dict) -> str: | ||
| """ | ||
| Create a new secret in AWS Secrets Manager. | ||
| :param secret_name: Name of the secret to create. | ||
| :param secret_value: Dictionary containing the secret data. | ||
| :return: ARN of the created secret if successful, None otherwise. | ||
| """ | ||
| try: | ||
| # Create the secret with the provided name and value | ||
| response = self.secretsmanager.create_secret( | ||
| Name=secret_name, | ||
| SecretString=json.dumps(secret_value), | ||
| ) | ||
| # Log success and return the secret's ARN | ||
| logger.info(f"Secret '{secret_name}' created successfully.") | ||
| return response["ARN"] | ||
| except ClientError as e: | ||
| # Log errors during secret creation and return None | ||
| logger.error(f"Error creating secret '{secret_name}': {e}") | ||
| return None | 
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.