Merge pull request #223176 from santiagxf/santiagxf/mlflow-2.0-deprecation-exp

BCS2022 · web-flow · commit 11e4679ac5da · 2023-01-06T09:55:13.000-08:00
Update how-to-track-experiments-mlflow.md
diff --git a/articles/machine-learning/how-to-track-experiments-mlflow.md b/articles/machine-learning/how-to-track-experiments-mlflow.md
@@ -15,7 +15,7 @@ ms.custom: how-to, devx-track-python, ignite-2022
 
 # Query & compare experiments and runs with MLflow
 
-Experiments and runs in Azure Machine Learning can be queried using MLflow. This removes the need of any Azure Machine Learning specific SDKs to manage anything that happens inside of a training job, allowing dependencies removal and creating a more seamless transition between local runs and cloud. 
+Experiments and runs tracking information in Azure Machine Learning can be queried using MLflow. You don't need to install any specific SDK to manage what happens inside of a training job, creating a more seamless transition between local runs and the cloud by removing cloud-specific dependencies. 
 
 > [!NOTE]
 > The Azure Machine Learning Python SDK v2 does not provide native logging or tracking capabilities. This applies not just for logging but also for querying the metrics logged. Instead, we recommend to use MLflow to manage experiments and runs. This article explains how to use MLflow to manage experiments and runs in Azure ML.
@@ -40,30 +40,33 @@ Use MLflow to query and manage all the experiments in Azure Machine Learning. Th
 
 You can get all the active experiments in the workspace using MLFlow:
 
-  ```python
-  experiments = mlflow.list_experiments()
-  for exp in experiments:
-      print(exp.name)
-  ```
+```python
+experiments = mlflow.search_experiments()
+for exp in experiments:
+    print(exp.name)
+```
+
+> [!NOTE]
+> __MLflow 2.0 advisory:__ In legacy versions of MLflow (<2.0) use method `list_experiments` instead.
 
 If you want to retrieve archived experiments too, then include the option `ViewType.ALL` in the `view_type` argument. The following sample shows how:
 
-  ```python
-  from mlflow.entities import ViewType
+```python
+from mlflow.entities import ViewType
 
-  experiments = mlflow.list_experiments(view_type=ViewType.ALL)
-  for exp in experiments:
-      print(exp.name)
-  ```
+experiments = mlflow.search_experiments(view_type=ViewType.ALL)
+for exp in experiments:
+    print(exp.name)
+```
 
 ## Getting a specific experiment
 
 Details about a specific experiment can be retrieved using the `get_experiment_by_name` method:
 
-  ```python
-  exp = mlflow.get_experiment_by_name(experiment_name)
-  print(exp)
-  ```
+```python
+exp = mlflow.get_experiment_by_name(experiment_name)
+print(exp)
+```
 
 ## Getting runs inside an experiment
 
@@ -77,14 +80,15 @@ MLflow allows searching runs inside of any experiment, including multiple experi
 
 By experiment name:
 
-  ```python
-  mlflow.search_runs(experiment_names=[ "my_experiment" ])
-  ```  
+```python
+mlflow.search_runs(experiment_names=[ "my_experiment" ])
+```  
+
 By experiment ID:
 
-  ```python
-  mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])
-  ```
+```python
+mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])
+```
 
 > [!TIP]
 > Notice that `experiment_ids` supports providing an array of experiments, so you can search runs across multiple experiments if required. This may be useful in case you want to compare runs of the same model when it is being logged in different experiments (by different people, different project iterations, etc). You can also use `search_all_experiments=True` if you want to search across all the experiments in the workspace.
@@ -95,33 +99,33 @@ Another important point to notice is that get returning runs, all metrics are pa
 
 By default, experiments are ordered descending by `start_time`, which is the time the experiment was queue in Azure ML. However, you can change this default by using the parameter `order_by`.
 
-  ```python
-  mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ], order_by=["start_time DESC"])
-  ```
+```python
+mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ], order_by=["start_time DESC"])
+```
   
 Use the argument `max_results` from `search_runs` to limit the number of runs returned. For instance, the following example returns the last run of the experiment:
 
-  ```python
-  mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ], max_results=1, order_by=["start_time DESC"])
-  ```
+```python
+mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ], max_results=1, order_by=["start_time DESC"])
+```
 
 > [!WARNING]
 > Using `order_by` with expressions containing `metrics.*` in the parameter `order_by` is not supported by the moment. Please use `order_values` method from Pandas as shown in the next example.
 
 You can also order by metrics to know which run generated the best results:
 
-  ```python
-  mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ]).sort_values("metrics.accuracy", ascending=False)
-  ```
+```python
+mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ]).sort_values("metrics.accuracy", ascending=False)
+```
   
 ### Filtering runs
 
 You can also look for a run with a specific combination in the hyperparameters using the parameter `filter_string`. Use `params` to access run's parameters and `metrics` to access metrics logged in the run. MLflow supports expressions joined by the AND keyword (the syntax does not support OR):
 
-  ```python
-  mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ], 
-                     filter_string="params.num_boost_round='100'")
-  ```
+```python
+mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ], 
+                   filter_string="params.num_boost_round='100'")
+```
 
 ### Filter runs by status
 
@@ -140,95 +144,105 @@ You can also filter experiment by status. It becomes useful to find runs that ar
 > [!WARNING]
 > Expressions containing `attributes.status` in the parameter `filter_string` are not support at the moment. Please use Pandas filtering expressions as shown in the next example.
 
-The following example shows all the runs that have been completed:
+The following example shows all the completed runs:
 
-  ```python
-  runs = mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])
-  runs[runs.status == "FINISHED"]
-  ```
+```python
+runs = mlflow.search_runs(experiment_ids=[ "1234-5678-90AB-CDEFG" ])
+runs[runs.status == "FINISHED"]
+```
   
 ## Getting metrics, parameters, artifacts and models
 
-By default, MLflow returns runs as a Pandas `Dataframe` containing a limited amount of information. You can get Python objects if needed, which may be useful to get details about them. Use the `output_format` parameter to control how output is returned:
+The method `search_runs` returns a Pandas `Dataframe` containing a limited amount of information by default. You can get Python objects if needed, which may be useful to get details about them. Use the `output_format` parameter to control how output is returned:
+
+```python
+runs = mlflow.search_runs(
+    experiment_ids=[ "1234-5678-90AB-CDEFG" ],
+    filter_string="params.num_boost_round='100'",
+    output_format="list",
+)
+```
 
-  ```python
-  runs = mlflow.search_runs(
-      experiment_ids=[ "1234-5678-90AB-CDEFG" ],
-      filter_string="params.num_boost_round='100'",
-      output_format="list",
-  )
-  ```
 Details can then be accessed from the `info` member. The following sample shows how to get the `run_id`:
 
-  ```python
-  last_run = runs[-1]
-  print("Last run ID:", last_run.info.run_id)
-  ```
+```python
+last_run = runs[-1]
+print("Last run ID:", last_run.info.run_id)
+```
   
 ### Getting params and metrics from a run
 
 When runs are returned using `output_format="list"`, you can easily access parameters using the key `data`:
 
-  ```python
-  last_run.data.params
-  ```
+```python
+last_run.data.params
+```
 
 In the same way, you can query metrics:
 
-  ```python
-  last_run.data.metrics
-  ```
+```python
+last_run.data.metrics
+```
+
 For metrics that contain multiple values (for instance, a loss curve, or a PR curve), only the last logged value of the metric is returned. If you want to retrieve all the values of a given metric, uses `mlflow.get_metric_history` method. This method requires you to use the `MlflowClient`:
 
-  ```python
-  client = mlflow.tracking.MlflowClient()
-  client.get_metric_history("1234-5678-90AB-CDEFG", "log_loss")
-  ```
+```python
+client = mlflow.tracking.MlflowClient()
+client.get_metric_history("1234-5678-90AB-CDEFG", "log_loss")
+```
 
 ### Getting artifacts from a run
 
 Any artifact logged by a run can be queried by MLflow. Artifacts can't be access using the run object itself and the MLflow client should be used instead:
 
-  ```python
-  client = mlflow.tracking.MlflowClient()
-  client.list_artifacts("1234-5678-90AB-CDEFG")
-  ```
+```python
+client = mlflow.tracking.MlflowClient()
+client.list_artifacts("1234-5678-90AB-CDEFG")
+```
 
 The method above will list all the artifacts logged in the run, but they will remain stored in the artifacts store (Azure ML storage). To download any of them, use the method `download_artifact`:
 
-  ```python
-  file_path = client.download_artifacts("1234-5678-90AB-CDEFG", path="feature_importance_weight.png")
-  ```
+```python
+file_path = mlflow.artifacts.download_artifacts(
+    run_id="1234-5678-90AB-CDEFG", artifact_path="feature_importance_weight.png"
+)
+```
+
+> [!NOTE]
+> __MLflow 2.0 advisory:__ In legacy versions of MLflow (<2.0), use the method `MlflowClient.download_artifacts()` instead.
 
 ### Getting models from a run
 
 Models can also be logged in the run and then retrieved directly from it. To retrieve it, you need to know the artifact's path where it is stored. The method `list_artifacats` can be used to find artifacts that are representing a model since MLflow models are always folders. You can download a model by indicating the path where the model is stored using the `download_artifact` method:
 
-  ```python
-  artifact_path="classifier"
-  model_local_path = client.download_artifacts("1234-5678-90AB-CDEFG", path=artifact_path)
-  ```
+```python
+artifact_path="classifier"
+model_local_path = mlflow.artifacts.download_artifacts(
+  run_id="1234-5678-90AB-CDEFG", artifact_path=artifact_path
+)
+```
   
 You can then load the model back from the downloaded artifacts using the typical function `load_model`:
 
-  ```python
-  model = mlflow.xgboost.load_model(model_local_path)
-  ```
+```python
+model = mlflow.xgboost.load_model(model_local_path)
+```
+
 > [!NOTE]
-> In the example above, we are assuming the model was created using `xgboost`. Change it to the flavor applies to your case.
+> The previous example assumes the model was created using `xgboost`. Change it to the flavor applies to your case.
 
-MLflow also allows you to both operations at once and download and load the model in a single instruction. MLflow will download the model to a temporary folder and load it from there. This can be done using the `load_model` method which uses an URI format to indicate from where the model has to be retrieved. In the case of loading a model from a run, the URI structure is as follows:
+MLflow also allows you to both operations at once and download and load the model in a single instruction. MLflow will download the model to a temporary folder and load it from there. The method `load_model` uses an URI format to indicate from where the model has to be retrieved. In the case of loading a model from a run, the URI structure is as follows:
 
-  ```python
-  model = mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")
-  ```
+```python
+model = mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")
+```
 
 > [!TIP]
 > You can also load models from the registry using MLflow. View [loading MLflow models with MLflow](how-to-manage-models-mlflow.md#loading-models-from-registry) for details.
 
 ## Getting child (nested) runs
 
-MLflow supports the concept of child (nested) runs. They are useful when you need to spin off training routines requiring being tracked independently from the main training process. This is the typical case of hyper-parameter tuning for instance. You can query all the child runs of a specific run using the property tag `mlflow.parentRunId`, which contains the run ID of the parent run.
+MLflow supports the concept of child (nested) runs. They are useful when you need to spin off training routines requiring being tracked independently from the main training process. Hyper-parameter tuning optimization processes or Azure Machine Learning pipelines are typical examples of jobs that generate multiple child runs. You can query all the child runs of a specific run using the property tag `mlflow.parentRunId`, which contains the run ID of the parent run.
 
 ```python
 hyperopt_run = mlflow.last_active_run()
@@ -237,7 +251,7 @@ child_runs = mlflow.search_runs(
 )
 ```
 
-## Compare jobs and models in AzureML Studio (preview)
+## Compare jobs and models in AzureML studio (preview)
 
 To compare and evaluate the quality of your jobs and models in AzureML Studio, use the [preview panel](./how-to-enable-preview-features.md) to enable the feature. Once enabled, you can compare the parameters, metrics, and tags between the jobs and/or models you selected.