remove alpha from pipeline parameters (#169)

sbaidachni · web-flow · commit 312dfdf416ea · 2020-01-31T23:09:14.000-08:00
diff --git a/.pipelines/diabetes_regression-ci-build-train.yml b/.pipelines/diabetes_regression-ci-build-train.yml
@@ -62,30 +62,20 @@ stages:
           echo "##vso[task.setvariable variable=AMLPIPELINEID;isOutput=true]$AMLPIPELINEID"
       name: 'getpipelineid'
       displayName: 'Get Pipeline ID'
-    - bash: |
-          # Generate a hyperparameter value as a random number between 0 and 1.
-          # A random value is used here to make the Azure ML dashboards "interesting" when testing
-          # the solution sample.
-          alpha=$(printf "0.%03d\n" $((($RANDOM*1000)/32767)))
-          echo "Alpha: $alpha"
-          echo "##vso[task.setvariable variable=ALPHA;isOutput=true]$alpha"
-      name: 'getalpha'
-      displayName: 'Generate random value for hyperparameter alpha'
   - job: "Run_ML_Pipeline"
     dependsOn: "Get_Pipeline_ID"
     displayName: "Trigger ML Training Pipeline"
     pool: server
     variables:
       AMLPIPELINE_ID: $[ dependencies.Get_Pipeline_ID.outputs['getpipelineid.AMLPIPELINEID'] ]
-      ALPHA: $[ dependencies.Get_Pipeline_ID.outputs['getalpha.ALPHA'] ]
     steps:
     - task: ms-air-aiagility.vss-services-azureml.azureml-restApi-task.MLPublishedPipelineRestAPITask@0
       displayName: 'Invoke ML pipeline'
       inputs:
         azureSubscription: '$(WORKSPACE_SVC_CONNECTION)'
         PipelineId: '$(AMLPIPELINE_ID)'
         ExperimentName: '$(EXPERIMENT_NAME)'
-        PipelineParameters: '"ParameterAssignments": {"model_name": "$(MODEL_NAME)", "hyperparameter_alpha": "$(ALPHA)"}'
+        PipelineParameters: '"ParameterAssignments": {"model_name": "$(MODEL_NAME)"}'
   - job: "Training_Run_Report"
     dependsOn: "Run_ML_Pipeline"
     condition: always()
diff --git a/diabetes_regression/config.json b/diabetes_regression/config.json
@@ -0,0 +1,14 @@
+{
+    "training":
+    {
+        "alpha": 0.4
+    },
+    "evaluation":
+    {
+
+    },
+    "scoring":
+    {
+        
+    }
+}
diff --git a/diabetes_regression/training/train.py b/diabetes_regression/training/train.py
@@ -32,6 +32,7 @@
 from sklearn.metrics import mean_squared_error
 from sklearn.model_selection import train_test_split
 from sklearn.externals import joblib
+import json
 
 
 def train_model(run, data, alpha):
@@ -62,13 +63,6 @@ def main():
         help="Name of the Model",
         default="sklearn_regression_model.pkl",
     )
-    parser.add_argument(
-        "--alpha",
-        type=float,
-        default=0.5,
-        help=("Ridge regression regularization strength hyperparameter; "
-              "must be a positive float.")
-    )
 
     parser.add_argument(
         "--dataset_name",
@@ -79,14 +73,23 @@ def main():
 
     print("Argument [build_id]: %s" % args.build_id)
     print("Argument [model_name]: %s" % args.model_name)
-    print("Argument [alpha]: %s" % args.alpha)
     print("Argument [dataset_name]: %s" % args.dataset_name)
 
     model_name = args.model_name
     build_id = args.build_id
-    alpha = args.alpha
     dataset_name = args.dataset_name
 
+    print("Getting training parameters")
+
+    with open("config.json") as f:
+        pars = json.load(f)
+    try:
+        alpha = pars["training"]["alpha"]
+    except KeyError:
+        alpha = 0.5
+
+    print("Parameter alpha: %s" % alpha)
+
     run = Run.get_context()
     ws = run.experiment.workspace
 
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -86,6 +86,8 @@ For instructions on how to set up a local development environment, refer to the
 
 For using Azure DevOps Pipelines all other variables are stored in the file `.pipelines/diabetes_regression-variables.yml`. Using the default values as a starting point, adjust the variables to suit your requirements.
 
+**Note:** In `diabetes_regression` folder you can find `config.json` file that we would recommend to use in order to provide parameters for training, evaluation and scoring scripts. An example of a such parameter is a hyperparameter of a training algorithm: in our case it's the ridge regression [*alpha* hyperparameter](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html). We don't provide any special serializers for this config file. So, it's up to you which template to support there.
+
 Up until now you should have:
 
 * Forked (or cloned) the repo
@@ -120,7 +122,7 @@ Check out the newly created resources in the [Azure Portal](portal.azure.com):
 (Optional) To remove the resources created for this project you can use the [/environment_setup/iac-remove-environment.yml](../environment_setup/iac-remove-environment.yml) definition or you can just delete the resource group in the [Azure Portal](portal.azure.com).
 
 **Note:** The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data. If you want to use your own dataset, you need to [create and register a datastore](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data#azure-machine-learning-studio) in your ML workspace and upload the datafile (e.g. [diabetes.csv](./data/diabetes.csv)) to the corresponding blob container. You can also define a datastore in the ML Workspace with [az cli](https://docs.microsoft.com/en-us/cli/azure/ext/azure-cli-ml/ml/datastore?view=azure-cli-latest#ext-azure-cli-ml-az-ml-datastore-attach-blob). 
-You'll also need to configure DATASTORE_NAME and DATAFILE_NAME variables in ***devopsforai-aml-vg*** variable group. 
+You'll also need to configure DATASTORE_NAME and DATAFILE_NAME variables in ***devopsforai-aml-vg*** variable group.
 
 
 ## Create an Azure DevOps Azure ML Workspace Service Connection
@@ -187,7 +189,7 @@ specified).
 **Note:** If the model evaluation determines that the new model does not perform better than the previous one then the new model will not be registered and the pipeline will be cancelled.
 
 * The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). It then runs a *smoke test* to validate the deployment, i.e. sends a sample query to the scoring web service and verifies that it returns a response in the expected format.
- 
+
 Wait until the pipeline finishes and verify that there is a new model in the **ML Workspace**:
 
 ![trained model](./images/trained-model.png)
@@ -247,7 +249,6 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta
 
 * The provided pipeline definition YAML file is a sample starting point, which you should tailor to your processes and environment.
 * You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
-* The sample pipeline generates a random value for a model hyperparameter (ridge regression [*alpha*](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)) to generate 'interesting' charts when testing the sample. In a real application you should use fixed hyperparameter values. You can [tune hyperparameter values using Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), and manage their values in Azure DevOps Variable Groups.
 * You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.
 * You can install additional Conda or pip packages by modifying the YAML environment configurations under the `diabetes_regression` directory. Make sure to use fixed version numbers for all packages to ensure reproducibility, and use the same versions across environments.
 * You can explore aspects of model observability in the solution, such as:
diff --git a/ml_service/pipelines/diabetes_regression_build_train_pipeline.py b/ml_service/pipelines/diabetes_regression_build_train_pipeline.py
@@ -44,8 +44,6 @@ def main():
         name="model_name", default_value=e.model_name)
     build_id_param = PipelineParameter(
         name="build_id", default_value=e.build_id)
-    hyperparameter_alpha_param = PipelineParameter(
-        name="hyperparameter_alpha", default_value=0.5)
 
     dataset_name = ""
     if (e.datastore_name is not None and e.datafile_name is not None):
@@ -66,7 +64,6 @@ def main():
         arguments=[
             "--build_id", build_id_param,
             "--model_name", model_name_param,
-            "--alpha", hyperparameter_alpha_param,
             "--dataset_name", dataset_name,
         ],
         runconfig=run_config,