aws-samples
diff --git a/‎BUILD.md
Lines changed: 1 addition & 1 deletion b/‎BUILD.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 15 additions & 15 deletions b/‎README.md
Lines changed: 15 additions & 15 deletions
diff --git a/‎app.py
Lines changed: 21 additions & 7 deletions b/‎app.py
Lines changed: 21 additions & 7 deletions
diff --git a/‎batch_pipeline/README.md
Lines changed: 45 additions & 0 deletions b/‎batch_pipeline/README.md
Lines changed: 45 additions & 0 deletions
diff --git a/‎batch_pipeline/app.py
Lines changed: 187 additions & 0 deletions b/‎batch_pipeline/app.py
Lines changed: 187 additions & 0 deletions
diff --git a/‎batch_pipeline/cdk.json
Lines changed: 13 additions & 0 deletions b/‎batch_pipeline/cdk.json
Lines changed: 13 additions & 0 deletions
@@ -1,4 +1,4 @@
-# Amazon SageMaker Drift Detection Pipeline
+# Amazon SageMaker Drift Detection
 
 This page has details on how to build a custom SageMaker MLOps template from source.
 
 
@@ -1,4 +1,4 @@
-# Amazon SageMaker Drift Detection Pipeline
+# Amazon SageMaker Drift Detection
 
 This sample demonstrates how to setup an Amazon SageMaker MLOps deployment pipeline for Drift detection
 
@@ -26,9 +26,9 @@ Follow are the list of the parameters.
 | PortfolioOwner     | The owner of the portfolio                     |
 | ProductVersion     | The product version to deploy                  |
 
-You can copy the the required `ExecutionRoleArn` role from the Studio dashboard.
+You can copy the the required `ExecutionRoleArn` role from your **User Details** in the SageMaker Studio dashboard.
 
-![Execution Role](docs/drift-execution-role.png)
+![Execution Role](docs/studio-execution-role.png)
 
 Alternatively see [BUILD.md](BUILD.md) for instructions on how to build the MLOps template from source.
 
@@ -39,21 +39,21 @@ Once your MLOps project template is registered in **AWS Service Catalog** you ca
 1. Switch back to the Launcher
 2. Click **New Project** from the **ML tasks and components** section.
 
-On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you published for the Amazon SageMaker Drift Detection Pipeline.
+On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you published for Amazon SageMaker drift detection.
 
-6. Choose **Organization templates**.
-7. Choose **Amazon SageMaker Drift Detection Pipeline**.
-8. Choose **Select project template**.
+3. Choose **Organization templates**.
+4. Choose **Amazon SageMaker drift detection template for real-time deployment**.
+5. Choose **Select project template**.
 
 ![Select Template](docs/drift-select-template.png)
 
 `NOTE`: If you have recently updated your AWS Service Catalog Project, you may need to refresh SageMaker Studio to ensure it picks up the latest version of your template.
 
-9. In the **Project details** section, for **Name**, enter **drift-pipeline**.
+6. In the **Project details** section, for **Name**, enter **drift-pipeline**.
   - The project name must have 32 characters or fewer.
-10. In the Project template parameters
-  - For **RetrainSchedule**, input a validate [Cron Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html) which defaults to `cron(0 12 1 * ? *)` - the first day of every month.
-11. Choose **Create project**.
+7. In the Project template parameter, for **RetrainSchedule**, input a validate [Cron Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html)
+  - This defaults to `cron(0 12 1 * ? *)` which is the first day of every month.
+8. Choose **Create project**.
 
 ![Create Project](docs/drift-create-project.png)
 
@@ -66,10 +66,10 @@ The MLOps Drift Detection template will create the following AWS services and re
 1. An [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3) bucket is created for output model artifacts generated from the pipeline.
 
 2. Two repositories are added to [AWS CodeCommit](https://aws.amazon.com/codecommit/):
-  -  The first repository provides code to create a multi-step model building pipeline using [AWS CloudFormation](https://aws.amazon.com/cloudformation/).  The pipeline includes the following steps: data processing, model baseline, model training, model evaluation, and conditional model registration based on accuracy. The pipeline trains a linear regression model using the XGBoost algorithm on trip data from the [NYC Taxi Dataset](https://registry.opendata.aws/nyc-tlc-trip-records-pds/). This repository also includes the [drift-detection.ipynb](build_pipeline/drift-detection.ipynb) notebook to [Run the Pipeline](#run-the-pipeline) (see below)
+  -  The first repository provides code to create a multi-step model building pipeline using [AWS CloudFormation](https://aws.amazon.com/cloudformation/).  The pipeline includes the following steps: data processing, model baseline, model training, model evaluation, and conditional model registration based on accuracy. The pipeline trains a linear regression model using the XGBoost algorithm on trip data from the [NYC Taxi Dataset](https://registry.opendata.aws/nyc-tlc-trip-records-pds/). This repository also includes the [build-pipeline.ipynb](build_pipeline/build-pipeline.ipynb) notebook to [Run the Pipeline](#run-the-pipeline) (see below)
   - The second repository contains code and configuration files for model deployment and monitoring. This repo also uses [AWS CodePipeline](https://aws.amazon.com/codepipeline/) and [CodeBuild](https://aws.amazon.com/codebuild/), which run an [AWS CloudFormation](https://aws.amazon.com/cloudformation/) template to create model endpoints for staging and production.  This repository includes the [prod-config.json](deployment_pipeline/prod-config.json) configure to set metrics and threshold for drift detection.
 
-3. Two CodePipeline pipelines:
+3. Two AWS CodePipeline pipelines:
   - The [model build pipeline](build_pipeline) creates or updates the pipeline definition and then starts a new execution with a custom [AWS Lambda](https://aws.amazon.com/lambda/) function whenever a new commit is made to the ModelBuild CodeCommit repository. The first time the CodePipeline is started, it will fail to complete expects input data to be uploaded to the Amazon S3 artifact bucket.
   - The [deployment pipeline](deployment_pipeline/README.md) automatically triggers whenever a new model version is added to the model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.
 
@@ -96,7 +96,7 @@ Once your project is created, following the instructions to [Clone the Code Repo
 1. Choose **Repositories**, and in the **Local path** column for the repository that ends with *build*, choose **clone repo....**
 2. In the dialog box that appears, accept the defaults and choose **Clone repository**
 3. When clone of the repository is complete, the local path appears in the **Local path** column. Click on the path to open the local folder that contains the repository code in SageMaker Studio.
-4. Click on the [drift-detection.ipynb](build_pipeline/drift-detection.ipynb) file to open the notebook.
+4. Click on the [build-pipeline.ipynb](build_pipeline/build-pipeline.ipynb) file to open the notebook.
 
 In the notebook, provide the **Project Name** in the first cell to get started:
 
@@ -141,7 +141,7 @@ This section outlines cost considerations for running the Drift Detection Pipeli
 
 ## Cleaning Up
 
-The [drift-detection.ipynb](build_pipeline/drift-detection.ipynb) notebook includes cells that you can run to cleanup the resources. 
+The [build-pipeline.ipynb](build_pipeline/build-pipeline.ipynb) notebook includes cells that you can run to cleanup the resources.
 
 1. SageMaker prod endpoint
 2. SageMaker staging endpoint
 
@@ -3,7 +3,7 @@
 import logging
 
 from aws_cdk import core
-from infra.pipeline_stack import PipelineStack
+from infra.pipeline_stack import BatchPipelineStack, DeployPipelineStack
 from infra.service_catalog_stack import ServiceCatalogStack
 
 # Configure the logger
@@ -17,13 +17,27 @@
 artifact_bucket = app.node.try_get_context("drift:ArtifactBucket")
 artifact_bucket_prefix = app.node.try_get_context("drift:ArtifactBucketPrefix")
 
-# Create the pipeline stack
-synth = core.DefaultStackSynthesizer(
-    file_assets_bucket_name=artifact_bucket,
-    generate_bootstrap_version_rule=False,
-    bucket_prefix=artifact_bucket_prefix,
+# Create the batch pipeline stack
+BatchPipelineStack(
+    app,
+    "drift-batch-pipeline",
+    synthesizer=core.DefaultStackSynthesizer(
+        file_assets_bucket_name=artifact_bucket,
+        bucket_prefix=artifact_bucket_prefix,
+        generate_bootstrap_version_rule=False,
+    ),
+)
+
+# Create the real-time deploy stack
+DeployPipelineStack(
+    app,
+    "drift-deploy-pipeline",
+    synthesizer=core.DefaultStackSynthesizer(
+        file_assets_bucket_name=artifact_bucket,
+        bucket_prefix=artifact_bucket_prefix,
+        generate_bootstrap_version_rule=False,
+    ),
 )
-PipelineStack(app, "drift-pipeline", synthesizer=synth)
 
 # Create the SC stack
 synth = core.DefaultStackSynthesizer(
 
@@ -0,0 +1,45 @@
+
+# Amazon SageMaker Drift Detection
+
+This folder contains the code to create a batch pipeline that includes a SageMaker Transform Job and [Model Monitor](https://aws.amazon.com/sagemaker/model-monitor/) Processing Job.
+
+## Build Pipeline
+
+The model build pipeline contains three stages:
+1. Source: This stage pulls the latest code from the **AWS CodeCommit** repository.
+2. Build: The **AWS CodeBuild** action creates an Amazon SageMaker Pipeline definition and stores this definition as a JSON on S3. Take a look at the pipeline definition in the CodeCommit repository `pipelines/pipeline.py`. The build also creates an **AWS CloudFormation** template using the AWS CDK - take a look at the respective CDK App `app.py`.
+3. BatchStaging: This stage executes the staging CloudFormation template to create/update a **SageMaker Pipeline** based on the latest approved model. The pipeline includes a manual approval gate, which triggers the deployment of the model to production.
+4. BatchProd: This stage creates or updates a **SageMaker Pipelines** which includes a **SageMaker Model Monitor** and **Evaluate Drift Lambda** that will emit [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html) (see below) that will trigger a **CloudWatch Alarm** for drift detection against the previously queried data quality baseline.
+
+![Batch Pipeline](../docs/drift-batch-pipeline.png)
+
+### Metrics Published
+
+CloudWatch Metrics are emitted with the following:
+* Namespace `aws/sagemaker/ModelBuildingPipeline/data-metrics` 
+* MetricName `feature_baseline_drift_<<feature_name>>`
+* MetricValue `distance` from the baseline
+
+### Starting the Batch Pipeline
+
+The batch pipeline outlined above will be started when code is committed to the **AWS CodeCommit** repository or when a model is approved in the **SageMaker Model Registry**.
+
+## Testing
+
+Once you have created a SageMaker Project, you can test the **Build** stage.
+
+### Build Stage
+
+Export the environment variables for the `SAGEMAKER_PROJECT_NAME` and `SAGEMAKER_PROJECT_ID` created by your SageMaker Project cloud formation.
+
+Then run the `python` command:
+
+```
+export SAGEMAKER_PROJECT_NAME="<<project_name>>"
+export SAGEMAKER_PROJECT_ID="<<project_id>>"
+export AWS_REGION="<<region>>"
+export ARTIFACT_BUCKET="sagemaker-project-<<project_id>>-build-<<region>>"
+export SAGEMAKER_PIPELINE_ROLE_ARN="<<service_catalog_product_use_role>>"
+export EVALUATE_DRIFT_FUNCTION_ARN="sagemaker-<<project_name>-evaluate-drift"
+cdk synth
+```
@@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+import argparse
+import json
+import logging
+import os
+
+# Import the pipeline
+from pipelines.pipeline import get_pipeline, upload_pipeline
+
+from aws_cdk import core
+from infra.batch_config import BatchConfig
+from infra.sagemaker_pipeline_stack import SageMakerPipelineStack
+from infra.model_registry import ModelRegistry
+
+
+# Configure the logger
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=os.environ.get("LOG_LEVEL", "INFO"))
+
+
+registry = ModelRegistry()
+
+
+def create_pipeline(
+    app: core.App,
+    project_name: str,
+    project_id: str,
+    region: str,
+    sagemaker_pipeline_role_arn: str,
+    artifact_bucket: str,
+    evaluate_drift_function_arn: str,
+    stage_name: str,
+):
+    # Get the stage specific deployment config for sagemaker
+    with open(f"{stage_name}-config.json", "r") as f:
+        j = json.load(f)
+        batch_config = BatchConfig(**j)
+
+    # Set the model package group to project name
+    package_group_name = project_name
+
+    # If we don't have a specific champion variant defined, get the latest approved
+    if batch_config.model_package_version is None:
+        logger.info("Selecting latest approved")
+        p = registry.get_latest_approved_packages(package_group_name, max_results=1)[0]
+        batch_config.model_package_version = p["ModelPackageVersion"]
+        batch_config.model_package_arn = p["ModelPackageArn"]
+    else:
+        # Get the versioned package and update ARN
+        logger.info(f"Selecting variant version {batch_config.model_package_version}")
+        p = registry.get_versioned_approved_packages(
+            package_group_name,
+            model_package_versions=[batch_config.model_package_version],
+        )[0]
+        batch_config.model_package_arn = p["ModelPackageArn"]
+
+    # Set the default input data uri
+    data_uri = f"s3://{artifact_bucket}/{project_id}/batch/{stage_name}"
+
+    # set the output transform uri
+    transform_uri = f"s3://{artifact_bucket}/{project_id}/transform/{stage_name}"
+
+    # Get the pipeline execution to get the baseline uri
+    pipeline_execution_arn = registry.get_pipeline_execution_arn(
+        batch_config.model_package_arn
+    )
+    logger.info(f"Got pipeline exection arn: {pipeline_execution_arn}")
+    model_uri = registry.get_model_artifact(pipeline_execution_arn)
+    logger.info(f"Got model uri: {model_uri}")
+
+    # Set the sagemaker pipeline name and descrption with model version
+    sagemaker_pipeline_name = f"{project_name}-batch-{stage_name}"
+    sagemaker_pipeline_description = f"Batch Pipeline for {stage_name} model version: {batch_config.model_package_version}"
+
+    # If we have drift configuration then get the baseline uri
+    baseline_uri = None
+    if batch_config.drift_config is not None:
+        baseline_uri = registry.get_processing_output(pipeline_execution_arn)
+        logger.info(f"Got baseline uri: {baseline_uri}")
+
+    # Create batch pipeline
+    pipeline = get_pipeline(
+        region=region,
+        role=sagemaker_pipeline_role_arn,
+        pipeline_name=sagemaker_pipeline_name,
+        default_bucket=artifact_bucket,
+        base_job_prefix=project_id,
+        evaluate_drift_function_arn=evaluate_drift_function_arn,
+        data_uri=data_uri,
+        model_uri=model_uri,
+        transform_uri=transform_uri,
+        baseline_uri=baseline_uri,
+    )
+
+    # Create the pipeline definition
+    logger.info("Creating/updating a SageMaker Pipeline for batch transform")
+    pipeline_definition_body = pipeline.definition()
+    parsed = json.loads(pipeline_definition_body)
+    logger.info(json.dumps(parsed, indent=2, sort_keys=True))
+
+    # Upload the pipeline to S3 bucket/key and return JSON with key/value for for Cfn Stack parameters.
+    # see: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-pipeline.html
+    logger.info(f"Uploading {stage_name} pipeline to {artifact_bucket}")
+    pipeline_definition_key = upload_pipeline(
+        pipeline,
+        default_bucket=artifact_bucket,
+        base_job_prefix=f"{project_id}/batch-{stage_name}",
+    )
+
+    tags = [
+        core.CfnTag(key="sagemaker:deployment-stage", value=stage_name),
+        core.CfnTag(key="sagemaker:project-id", value=project_id),
+        core.CfnTag(key="sagemaker:project-name", value=project_name),
+    ]
+
+    SageMakerPipelineStack(
+        app,
+        f"drift-batch-{stage_name}",
+        pipeline_name=sagemaker_pipeline_name,
+        pipeline_description=sagemaker_pipeline_description,
+        pipeline_definition_bucket=artifact_bucket,
+        pipeline_definition_key=pipeline_definition_key,
+        sagemaker_role_arn=sagemaker_pipeline_role_arn,
+        tags=tags,
+        drift_config=batch_config.drift_config,
+    )
+
+
+def main(
+    project_name: str,
+    project_id: str,
+    region: str,
+    sagemaker_pipeline_role_arn: str,
+    artifact_bucket: str,
+    evaluate_drift_function_arn: str,
+):
+    # Create App and stacks
+    app = core.App()
+
+    create_pipeline(
+        app=app,
+        project_name=project_name,
+        project_id=project_id,
+        region=region,
+        sagemaker_pipeline_role_arn=sagemaker_pipeline_role_arn,
+        artifact_bucket=artifact_bucket,
+        evaluate_drift_function_arn=evaluate_drift_function_arn,
+        stage_name="staging",
+    )
+
+    create_pipeline(
+        app=app,
+        project_name=project_name,
+        project_id=project_id,
+        region=region,
+        sagemaker_pipeline_role_arn=sagemaker_pipeline_role_arn,
+        artifact_bucket=artifact_bucket,
+        evaluate_drift_function_arn=evaluate_drift_function_arn,
+        stage_name="prod",
+    )
+
+    app.synth()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Load parameters")
+    parser.add_argument("--region", default=os.environ.get("AWS_REGION"))
+    parser.add_argument(
+        "--project-name",
+        default=os.environ.get("SAGEMAKER_PROJECT_NAME"),
+    )
+    parser.add_argument("--project-id", default=os.environ.get("SAGEMAKER_PROJECT_ID"))
+    parser.add_argument(
+        "--sagemaker-pipeline-role-arn",
+        default=os.environ.get("SAGEMAKER_PIPELINE_ROLE_ARN"),
+    )
+    parser.add_argument(
+        "--evaluate-drift-function-arn",
+        default=os.environ.get("EVALUATE_DRIFT_FUNCTION_ARN"),
+    )
+    parser.add_argument(
+        "--artifact-bucket",
+        default=os.environ.get("ARTIFACT_BUCKET"),
+    )
+    args = vars(parser.parse_args())
+    logger.info("args: {}".format(args))
+    main(**args)
@@ -0,0 +1,13 @@
+{
+    "app": "python3 app.py",
+    "context": {
+      "@aws-cdk/core:enableStackNameDuplicates": "true",
+      "aws-cdk:enableDiffNoFail": "true",
+      "@aws-cdk/core:stackRelativeExports": "true",
+      "@aws-cdk/aws-ecr-assets:dockerIgnoreSupport": true,
+      "@aws-cdk/aws-secretsmanager:parseOwnedSecretName": true,
+      "@aws-cdk/aws-kms:defaultKeyPolicies": true,
+      "@aws-cdk/aws-s3:grantWriteWithoutAcl": true
+    }
+  }
+
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Amazon SageMaker Drift Detection Pipeline`
	`1`	`+# Amazon SageMaker Drift Detection`
`2`	`2`
`3`	`3`	`This page has details on how to build a custom SageMaker MLOps template from source.`
`4`	`4`