Merge pull request #223263 from santiagxf/santiagxf/mlflow-projects

ttorble · web-flow · commit b8ece8b0954e · 2023-01-08T09:29:31.000Z
Update how-to-train-mlflow-projects.md
diff --git a/articles/machine-learning/how-to-train-mlflow-projects.md b/articles/machine-learning/how-to-train-mlflow-projects.md
@@ -13,128 +13,139 @@ ms.topic: conceptual
 ms.custom: how-to, devx-track-python, sdkv2, event-tier1-build-2022
 ---
 
-# Train ML models with MLflow Projects and Azure Machine Learning (Preview)
+# Train with MLflow Projects in Azure Machine Learning (Preview)
 
-In this article, learn how to enable MLflow's tracking URI and logging API, collectively known as [MLflow Tracking](https://mlflow.org/docs/latest/quickstart.html#using-the-tracking-api), to submit training jobs with [MLflow Projects](https://www.mlflow.org/docs/latest/projects.html) and Azure Machine Learning backend support. You can submit jobs locally with Azure Machine Learning tracking or migrate your runs to the cloud like via an [Azure Machine Learning Compute](./how-to-create-attach-compute-cluster.md).
+In this article, learn how to submit training jobs with [MLflow Projects](https://www.mlflow.org/docs/latest/projects.html) that uses Azure Machine Learning workspaces for tracking. You can submit jobs and only track them with Azure Machine Learning or migrate your runs to the cloud to run completely on [Azure Machine Learning Compute](./how-to-create-attach-compute-cluster.md).
 
 [MLflow Projects](https://mlflow.org/docs/latest/projects.html) allow for you to organize and describe your code to let other data scientists (or automated tools) run it. MLflow Projects with Azure Machine Learning enable you to track and manage your training runs in your workspace.
 
-[MLflow](https://www.mlflow.org) is an open-source library for managing the life cycle of your machine learning experiments. MLFlow Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an [Azure Databricks cluster](how-to-use-mlflow-azure-databricks.md).
-
-[Learn more about the MLflow and Azure Machine Learning integration.](how-to-use-mlflow.md).
-
-> [!TIP]
-> The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning, such as quotas, completed training runs, or completed model deployments, see [Monitoring Azure Machine Learning](monitor-azure-machine-learning.md).
+[Learn more about the MLflow and Azure Machine Learning integration.](concept-mlflow.md)
 
 ## Prerequisites
 
 [!INCLUDE [mlflow-prereqs](../../includes/machine-learning-mlflow-prereqs.md)]
 
-### Connect to your workspace
-
-First, let's connect MLflow to your Azure Machine Learning workspace.
-
-# [Azure Machine Learning compute](#tab/aml)
-
-Tracking is already configured for you. Your default credentials will also be used when working with MLflow.
-
-# [Remote compute](#tab/remote)
-
-**Configure tracking URI**
-
-[!INCLUDE [configure-mlflow-tracking](../../includes/machine-learning-mlflow-configure-tracking.md)]
-
-**Configure authentication**
-
-Once the tracking is configured, you'll also need to configure how the authentication needs to happen to the associated workspace. By default, the Azure Machine Learning plugin for MLflow will perform interactive authentication by opening the default browser to prompt for credentials. Refer to [Configure MLflow for Azure Machine Learning: Configure authentication](how-to-use-mlflow-configure-tracking.md#configure-authentication) to additional ways to configure authentication for MLflow in Azure Machine Learning workspaces.
-
-[!INCLUDE [configure-mlflow-auth](../../includes/machine-learning-mlflow-configure-auth.md)]
-
----
-
-## Train MLflow Projects on local compute
-
-This example shows how to submit MLflow projects locally with Azure Machine Learning.
-
-Create the backend configuration object to store necessary information for the integration such as, the compute target and which type of managed environment to use.
+* Using Azure Machine Learning as backend for MLflow projects requires the package `azureml-core`:
 
-```python
-backend_config = {"USE_CONDA": False}
-```
+  ```bash
+  pip install azureml-core
+  ```
 
-Add the `azureml-mlflow` package as a pip dependency to your environment configuration file in order to track metrics and key artifacts in your workspace. 
-
-``` shell
-name: mlflow-example
-channels:
-  - defaults
-  - anaconda
-  - conda-forge
-dependencies:
-  - python=3.6
-  - scikit-learn=0.19.1
-  - pip
-  - pip:
-    - mlflow
-    - azureml-mlflow
-```
-
-Submit the local run and ensure you set the parameter `backend = "azureml" `. With this setting, you can submit runs locally and get the added support of automatic output tracking, log files, snapshots, and printed errors in your workspace.
-
-View your runs and metrics in the [Azure Machine Learning studio](https://ml.azure.com).
-
-```python
-local_env_run = mlflow.projects.run(uri=".", 
-                                    parameters={"alpha":0.3},
-                                    backend = "azureml",
-                                    use_conda=False,
-                                    backend_config = backend_config, 
-                                    )
-
-```
-
-## Train MLflow projects with remote compute
-
-This example shows how to submit MLflow projects on a remote compute with Azure Machine Learning tracking.
-
-Create the backend configuration object to store necessary information for the integration such as, the compute target and which type of managed environment to use.
-
-The integration accepts "COMPUTE" and "USE_CONDA" as parameters where "COMPUTE" is set to the name of your remote compute cluster and "USE_CONDA" which creates a new environment for the project from the environment configuration file. If "COMPUTE" is present in the object, the project will be automatically submitted to the remote compute and ignore "USE_CONDA". MLflow accepts a dictionary object or a JSON file.
-
-```python
-# dictionary
-backend_config = {"COMPUTE": "cpu-cluster", "USE_CONDA": False}
-```
-
-Add the `azureml-mlflow` package as a pip dependency to your environment configuration file in order to track metrics and key artifacts in your workspace. 
-
-``` shell
-name: mlflow-example
-channels:
-  - defaults
-  - anaconda
-  - conda-forge
-dependencies:
-  - python=3.6
-  - scikit-learn=0.19.1
-  - pip
-  - pip:
-    - mlflow
-    - azureml-mlflow
-```
-
-Submit the mlflow project run and ensure you set the parameter `backend = "azureml" `. With this setting, you can submit your run to your remote compute and get the added support of automatic output tracking, log files, snapshots, and printed errors in your workspace.
-
-View your runs and metrics in the [Azure Machine Learning studio](https://ml.azure.com).
-
-```python
-remote_mlflow_run = mlflow.projects.run(uri=".", 
-                                    parameters={"alpha":0.3},
-                                    backend = "azureml",
-                                    backend_config = backend_config, 
-                                    )
+### Connect to your workspace
 
-```
+If you're working outside Azure Machine Learning, you need to configure MLflow to point to your Azure Machine Learning workspace's tracking URI. You can find the instructions at [Configure MLflow for Azure Machine Learning](how-to-use-mlflow-configure-tracking.md).
+
+
+## Track MLflow Projects in Azure Machine Learning workspaces
+
+This example shows how to submit MLflow projects and track them Azure Machine Learning.
+
+1. Add the `azureml-mlflow` package as a pip dependency to your environment configuration file in order to track metrics and key artifacts in your workspace. 
+
+    __conda.yaml__
+
+    ```yaml
+    name: mlflow-example
+    channels:
+      - defaults
+    dependencies:
+      - numpy>=1.14.3
+      - pandas>=1.0.0
+      - scikit-learn
+      - pip:
+        - mlflow
+        - azureml-mlflow
+    ```
+
+1. Submit the local run and ensure you set the parameter `backend = "azureml"`, which adds support of automatic tracking, model's capture, log files, snapshots, and printed errors in your workspace. In this example we assume the MLflow project you are trying to run is in the same folder you currently are, `uri="."`.
+  
+    # [MLflow CLI](#tab/cli)
+    
+    ```bash
+    mlflow run . --experiment-name  --backend azureml --env-manager=local -P alpha=0.3
+    ```
+  
+    # [Python](#tab/sdk)
+
+    ```python
+    local_env_run = mlflow.projects.run(
+        uri=".", 
+        parameters={"alpha":0.3},
+        backend = "azureml",
+        env_manager="local",
+        backend_config = backend_config, 
+    )
+    ```
+    
+    ---
+  
+    View your runs and metrics in the [Azure Machine Learning studio](https://ml.azure.com).
+
+## Train MLflow projects in Azure Machine Learning jobs
+
+This example shows how to submit MLflow projects as a job running on Azure Machine Learning compute.
+
+1. Create the backend configuration object, in this case we are going to indicate `COMPUTE`. This parameter references the name of your remote compute cluster you want to use for running your project. If `COMPUTE` is present, the project will be automatically submitted as an Azure Machine Learning job to the indicated compute. 
+
+    # [MLflow CLI](#tab/cli)
+  
+    __backend_config.json__
+  
+    ```json
+    {
+        "COMPUTE": "cpu-cluster"
+    }
+    
+    ```
+  
+    # [Python](#tab/sdk)
+  
+    ```python
+    backend_config = {"COMPUTE": "cpu-cluster"}
+    ```
+
+1. Add the `azureml-mlflow` package as a pip dependency to your environment configuration file in order to track metrics and key artifacts in your workspace. 
+
+    __conda.yaml__
+
+    ```yaml
+    name: mlflow-example
+    channels:
+      - defaults
+    dependencies:
+      - numpy>=1.14.3
+      - pandas>=1.0.0
+      - scikit-learn
+      - pip:
+        - mlflow
+        - azureml-mlflow
+    ```
+
+1. Submit the local run and ensure you set the parameter `backend = "azureml"`, which adds support of automatic tracking, model's capture, log files, snapshots, and printed errors in your workspace. In this example we assume the MLflow project you are trying to run is in the same folder you currently are, `uri="."`.
+
+    # [MLflow CLI](#tab/cli)
+ 
+    ```bash
+    mlflow run . --backend azureml --backend-config backend_config.json -P alpha=0.3
+    ```
+  
+    # [Python](#tab/sdk)
+  
+    ```python
+    local_env_run = mlflow.projects.run(
+        uri=".", 
+        parameters={"alpha":0.3},
+        backend = "azureml",
+        backend_config = backend_config, 
+    )
+    ```
+    
+    ---
+  
+    > [!NOTE]
+    > Since Azure Machine Learning jobs always run in the context of environments, the parameter `env_manager` is ignored.
+  
+    View your runs and metrics in the [Azure Machine Learning studio](https://ml.azure.com).
 
 
 ## Clean up resources
@@ -163,7 +174,8 @@ The [MLflow with Azure ML notebooks](https://github.com/Azure/MachineLearningNot
 
 ## Next steps
 
-* [Deploy models with MLflow](how-to-deploy-mlflow-models.md).
-* Monitor your production models for [data drift](v1/how-to-enable-data-collection.md).
 * [Track Azure Databricks runs with MLflow](how-to-use-mlflow-azure-databricks.md).
-* [Manage your models](concept-model-management-and-deployment.md).
+* [Query & compare experiments and runs with MLflow](how-to-track-experiments-mlflow.md).
+* [Manage models registries in Azure Machine Learning with MLflow](how-to-manage-models-mlflow.md).
+* [Guidelines for deploying MLflow models](how-to-deploy-mlflow-models.md).
+