You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Query & compare experiments and runs with MLflow
17
17
18
-
Experiments and runs in Azure Machine Learning can be queried using MLflow. This removes the need of any Azure Machine Learning specific SDKs to manage anything that happens inside of a training job, allowing dependencies removal and creating a more seamless transition between local runs and cloud.
18
+
Experiments and runs tracking information in Azure Machine Learning can be queried using MLflow. You don't need to install any specific SDK to manage what happens inside of a training job, creating a more seamless transition between local runs and the cloud by removing cloud-specific dependencies.
19
19
20
20
> [!NOTE]
21
21
> The Azure Machine Learning Python SDK v2 does not provide native logging or tracking capabilities. This applies not just for logging but also for querying the metrics logged. Instead, we recommend to use MLflow to manage experiments and runs. This article explains how to use MLflow to manage experiments and runs in Azure ML.
@@ -40,30 +40,33 @@ Use MLflow to query and manage all the experiments in Azure Machine Learning. Th
40
40
41
41
You can get all the active experiments in the workspace using MLFlow:
42
42
43
-
```python
44
-
experiments = mlflow.list_experiments()
45
-
for exp in experiments:
46
-
print(exp.name)
47
-
```
43
+
```python
44
+
experiments = mlflow.search_experiments()
45
+
for exp in experiments:
46
+
print(exp.name)
47
+
```
48
+
49
+
> [!NOTE]
50
+
> __MLflow 2.0 advisory:__ In legacy versions of MLflow (<2.0) use method `list_experiments` instead.
48
51
49
52
If you want to retrieve archived experiments too, then include the option `ViewType.ALL` in the `view_type` argument. The following sample shows how:
> Notice that `experiment_ids` supports providing an array of experiments, so you can search runs across multiple experiments if required. This may be useful in case you want to compare runs of the same model when it is being logged in different experiments (by different people, different project iterations, etc). You can also use `search_all_experiments=True` if you want to search across all the experiments in the workspace.
@@ -95,33 +99,33 @@ Another important point to notice is that get returning runs, all metrics are pa
95
99
96
100
By default, experiments are ordered descending by `start_time`, which is the time the experiment was queue in Azure ML. However, you can change this default by using the parameter `order_by`.
Use the argument `max_results` from `search_runs` to limit the number of runs returned. For instance, the following example returns the last run of the experiment:
> Using `order_by` with expressions containing `metrics.*` in the parameter `order_by` is not supported by the moment. Please use `order_values` method from Pandas as shown in the next example.
110
114
111
115
You can also order by metrics to know which run generated the best results:
You can also look for a run with a specific combination in the hyperparameters using the parameter `filter_string`. Use `params` to access run's parameters and `metrics` to access metrics logged in the run. MLflow supports expressions joined by the AND keyword (the syntax does not support OR):
@@ -140,95 +144,105 @@ You can also filter experiment by status. It becomes useful to find runs that ar
140
144
> [!WARNING]
141
145
> Expressions containing `attributes.status` in the parameter `filter_string` are not support at the moment. Please use Pandas filtering expressions as shown in the next example.
142
146
143
-
The following example shows all the runs that have been completed:
147
+
The following example shows all the completed runs:
## Getting metrics, parameters, artifacts and models
151
155
152
-
By default, MLflow returns runs as a Pandas `Dataframe` containing a limited amount of information. You can get Python objects if needed, which may be useful to get details about them. Use the `output_format` parameter to control how output is returned:
156
+
The method `search_runs` returns a Pandas `Dataframe` containing a limited amount of information by default. You can get Python objects if needed, which may be useful to get details about them. Use the `output_format` parameter to control how output is returned:
157
+
158
+
```python
159
+
runs = mlflow.search_runs(
160
+
experiment_ids=[ "1234-5678-90AB-CDEFG" ],
161
+
filter_string="params.num_boost_round='100'",
162
+
output_format="list",
163
+
)
164
+
```
153
165
154
-
```python
155
-
runs = mlflow.search_runs(
156
-
experiment_ids=[ "1234-5678-90AB-CDEFG" ],
157
-
filter_string="params.num_boost_round='100'",
158
-
output_format="list",
159
-
)
160
-
```
161
166
Details can then be accessed from the `info` member. The following sample shows how to get the `run_id`:
162
167
163
-
```python
164
-
last_run = runs[-1]
165
-
print("Last run ID:", last_run.info.run_id)
166
-
```
168
+
```python
169
+
last_run = runs[-1]
170
+
print("Last run ID:", last_run.info.run_id)
171
+
```
167
172
168
173
### Getting params and metrics from a run
169
174
170
175
When runs are returned using `output_format="list"`, you can easily access parameters using the key `data`:
171
176
172
-
```python
173
-
last_run.data.params
174
-
```
177
+
```python
178
+
last_run.data.params
179
+
```
175
180
176
181
In the same way, you can query metrics:
177
182
178
-
```python
179
-
last_run.data.metrics
180
-
```
183
+
```python
184
+
last_run.data.metrics
185
+
```
186
+
181
187
For metrics that contain multiple values (for instance, a loss curve, or a PR curve), only the last logged value of the metric is returned. If you want to retrieve all the values of a given metric, uses `mlflow.get_metric_history` method. This method requires you to use the `MlflowClient`:
Any artifact logged by a run can be queried by MLflow. Artifacts can't be access using the run object itself and the MLflow client should be used instead:
191
197
192
-
```python
193
-
client = mlflow.tracking.MlflowClient()
194
-
client.list_artifacts("1234-5678-90AB-CDEFG")
195
-
```
198
+
```python
199
+
client = mlflow.tracking.MlflowClient()
200
+
client.list_artifacts("1234-5678-90AB-CDEFG")
201
+
```
196
202
197
203
The method above will list all the artifacts logged in the run, but they will remain stored in the artifacts store (Azure ML storage). To download any of them, use the method `download_artifact`:
> __MLflow 2.0 advisory:__ In legacy versions of MLflow (<2.0), use the method `MlflowClient.download_artifacts()` instead.
202
213
203
214
### Getting models from a run
204
215
205
216
Models can also be logged in the run and then retrieved directly from it. To retrieve it, you need to know the artifact's path where it is stored. The method `list_artifacats` can be used to find artifacts that are representing a model since MLflow models are always folders. You can download a model by indicating the path where the model is stored using the `download_artifact` method:
You can then load the model back from the downloaded artifacts using the typical function `load_model`:
213
226
214
-
```python
215
-
model = mlflow.xgboost.load_model(model_local_path)
216
-
```
227
+
```python
228
+
model = mlflow.xgboost.load_model(model_local_path)
229
+
```
230
+
217
231
> [!NOTE]
218
-
> In the example above, we are assuming the model was created using `xgboost`. Change it to the flavor applies to your case.
232
+
> The previous example assumes the model was created using `xgboost`. Change it to the flavor applies to your case.
219
233
220
-
MLflow also allows you to both operations at once and download and load the model in a single instruction. MLflow will download the model to a temporary folder and load it from there. This can be done using the `load_model` method which uses an URI format to indicate from where the model has to be retrieved. In the case of loading a model from a run, the URI structure is as follows:
234
+
MLflow also allows you to both operations at once and download and load the model in a single instruction. MLflow will download the model to a temporary folder and load it from there. The method `load_model` uses an URI format to indicate from where the model has to be retrieved. In the case of loading a model from a run, the URI structure is as follows:
221
235
222
-
```python
223
-
model = mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")
224
-
```
236
+
```python
237
+
model = mlflow.xgboost.load_model(f"runs:/{last_run.info.run_id}/{artifact_path}")
238
+
```
225
239
226
240
> [!TIP]
227
241
> You can also load models from the registry using MLflow. View [loading MLflow models with MLflow](how-to-manage-models-mlflow.md#loading-models-from-registry) for details.
228
242
229
243
## Getting child (nested) runs
230
244
231
-
MLflow supports the concept of child (nested) runs. They are useful when you need to spin off training routines requiring being tracked independently from the main training process. This is the typical case of hyper-parameter tuning for instance. You can query all the child runs of a specific run using the property tag `mlflow.parentRunId`, which contains the run ID of the parent run.
245
+
MLflow supports the concept of child (nested) runs. They are useful when you need to spin off training routines requiring being tracked independently from the main training process. Hyper-parameter tuning optimization processes or Azure Machine Learning pipelines are typical examples of jobs that generate multiple child runs. You can query all the child runs of a specific run using the property tag `mlflow.parentRunId`, which contains the run ID of the parent run.
## Compare jobs and models in AzureML Studio (preview)
254
+
## Compare jobs and models in AzureML studio (preview)
241
255
242
256
To compare and evaluate the quality of your jobs and models in AzureML Studio, use the [preview panel](./how-to-enable-preview-features.md) to enable the feature. Once enabled, you can compare the parameters, metrics, and tags between the jobs and/or models you selected.
0 commit comments