Skip to content

Commit b25b057

Browse files
Merge pull request #271966 from msakande/MLflow-async-logging
updates for MLflow async logging
2 parents 316af01 + 225c4fb commit b25b057

File tree

3 files changed

+96
-12
lines changed

3 files changed

+96
-12
lines changed

articles/machine-learning/how-to-automl-forecasting-faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ For examples and details, see the [notebook for advanced forecasting scenarios](
144144

145145
## How do I view metrics from forecasting training jobs?
146146

147-
To find training and validation metric values, see [View jobs/runs information in the studio](how-to-log-view-metrics.md#view-jobsruns-information-in-the-studio). You can view metrics for any forecasting model trained in AutoML by going to a model from the AutoML job UI in the studio and selecting the **Metrics** tab.
147+
To find training and validation metric values, see [View information about jobs or runs in the studio](how-to-log-view-metrics.md#view-information-about-jobs-or-runs-in-the-studio). You can view metrics for any forecasting model trained in AutoML by going to a model from the AutoML job UI in the studio and selecting the **Metrics** tab.
148148

149149
:::image type="content" source="media/how-to-automl-forecasting-faq/metrics_UI.png" alt-text="Screenshot that shows the metric interface for an AutoML forecasting model.":::
150150

articles/machine-learning/how-to-log-mlflow-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Logging MLflow models, instead of artifacts, with MLflow SDK in Azu
55
services: machine-learning
66
author: santiagxf
77
ms.author: fasantia
8-
ms.reviewer: franksolomon
8+
ms.reviewer: mopeakande
99
ms.service: machine-learning
1010
ms.subservice: mlops
1111
ms.date: 02/16/2024

articles/machine-learning/how-to-log-view-metrics.md

Lines changed: 94 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ title: Log metrics, parameters, and files with MLflow
33
titleSuffix: Azure Machine Learning
44
description: Enable logging on your ML training runs to monitor real-time run metrics with MLflow, and to help diagnose errors and warnings.
55
services: machine-learning
6-
ms.author: amipatel
7-
author: amibp
8-
ms.reviewer: sgilley
6+
ms.author: fasantia
7+
author: santiagxf
8+
ms.reviewer: mopeakande
99
ms.service: machine-learning
10-
ms.subservice: core
11-
ms.date: 01/30/2024
10+
ms.subservice: mlops
11+
ms.date: 04/26/2024
1212
ms.topic: how-to
1313
ms.custom: sdkv2
1414
---
@@ -27,6 +27,7 @@ Logs can help you diagnose errors and warnings, or track performance metrics lik
2727
> [!div class="checklist"]
2828
> * Log metrics, parameters, and models when submitting jobs.
2929
> * Track runs when training interactively.
30+
> * Log metrics asynchronously.
3031
> * View diagnostic information about training.
3132
3233
> [!TIP]
@@ -41,6 +42,9 @@ Logs can help you diagnose errors and warnings, or track performance metrics lik
4142
pip install mlflow azureml-mlflow
4243
```
4344

45+
> [!NOTE]
46+
> For asynchronous logging of metrics, you need to have `MLflow` version 2.8.0+ and `azureml-mlflow` version 1.55+.
47+
4448
* If you're doing remote tracking (tracking experiments that run outside Azure Machine Learning), configure MLflow to track experiments. For more information, see [Configure MLflow for Azure Machine Learning](how-to-use-mlflow-configure-tracking.md).
4549
4650
* To log metrics, parameters, artifacts, and models in your experiments in Azure Machine Learning using MLflow, just import MLflow into your script:
@@ -144,7 +148,7 @@ mlflow.log_params(params)
144148
145149
## Log metrics
146150
147-
Metrics, as opposite to parameters, are always numeric. The following table describes how to log specific numeric types:
151+
Metrics, as opposite to parameters, are always numeric, and they can be logged either synchronously or asynchronously. When metrics are logged, they are immediately available for consumption upon call return. The following table describes how to log specific numeric types:
148152
149153
|Logged value|Example code| Notes|
150154
|----|----|----|
@@ -153,7 +157,86 @@ Metrics, as opposite to parameters, are always numeric. The following table desc
153157
|Log a boolean value | `mlflow.log_metric("my_metric", 0)`| 0 = True, 1 = False|
154158
155159
> [!IMPORTANT]
156-
> **Performance considerations:** If you need to log multiple metrics (or multiple values for the same metric), avoid making calls to `mlflow.log_metric` in loops. Better performance can be achieved by logging a batch of metrics. Use the method `mlflow.log_metrics` which accepts a dictionary with all the metrics you want to log at once or use `MLflowClient.log_batch` which accepts multiple type of elements for logging. See [Log curves or list of values](#log-curves-or-list-of-values) for an example.
160+
> **Performance considerations:** If you need to log multiple metrics (or multiple values for the same metric), avoid making calls to `mlflow.log_metric` in loops. Better performance can be achieved by using [asynchronous logging](#log-metrics-asynchronously) with `mlflow.log_metric("metric1", 9.42, synchronous=False)` or by [logging a batch of metrics](#log-curves-or-list-of-values).
161+
162+
### Log metrics asynchronously
163+
164+
MLflow also allows logging of metrics in an asynchronous way. Asynchronous metric logging is particularly useful in cases with high throughput where large training jobs with hundreds of compute nodes might be running and trying to log metrics concurrently.
165+
166+
Asynchronous metric logging allows you to log metrics and wait for them to be ingested before trying to read them back. This approach scales to large training routines that log hundreds of thousands of metric values.
167+
168+
MLflow logs metrics synchronously by default, however, you can change this behavior at any time:
169+
170+
```python
171+
import mlflow
172+
173+
mlflow.config.enable_async_logging()
174+
```
175+
176+
The same property can be set, using an environment variable:
177+
178+
```python
179+
export MLFLOW_ENABLE_ASYNC_LOGGING=True
180+
```
181+
182+
To log specific metrics asynchronously, use the MLflow logging API as you typically would, but add the extra parameter `synchronous=False`.
183+
184+
```python
185+
import mlflow
186+
187+
with mlflow.start_run():
188+
# (...)
189+
mlflow.log_metric("metric1", 9.42, synchronous=False)
190+
# (...)
191+
```
192+
193+
When you use `log_metric(synchronous=False)`, control is automatically returned to the caller once the operation is accepted; however, there is no guarantee at that moment that the metric value has been persisted.
194+
195+
> [!IMPORTANT]
196+
> Even with `synchronous=False`, Azure Machine Learning guarantees the ordering of metrics.
197+
198+
If you need to wait for a particular value to be persisted in the backend, then you can use the metric operation returned to wait on it, as shown in the following example:
199+
200+
```python
201+
import mlflow
202+
203+
with mlflow.start_run():
204+
# (...)
205+
run_operation = mlflow.log_metric("metric1", 9.42, synchronous=False)
206+
# (...)
207+
run_operation.wait()
208+
# (...)
209+
```
210+
211+
You can asynchronously log one metric at a time or log a batch of metrics, as shown in the following example:
212+
213+
```python
214+
import mlflow
215+
import time
216+
from mlflow.entities import Metric
217+
218+
with mlflow.start_run() as current_run:
219+
mlflow_client = mlflow.tracking.MlflowClient()
220+
221+
metrics = {"metric-0": 3.14, "metric-1": 6.28}
222+
timestamp = int(time.time() * 1000)
223+
metrics_arr = [Metric(key, value, timestamp, 0) for key, value in metrics.items()]
224+
225+
run_operation = mlflow_client.log_batch(
226+
run_id=current_run.info.run_id,
227+
metrics=metrics_arr,
228+
synchronous=False,
229+
)
230+
```
231+
232+
The `wait()` operation is also available when logging a batch of metrics:
233+
234+
```python
235+
run_operation.wait()
236+
```
237+
238+
You don't have to call `wait()` on your routines if you don't need immediate access to the metric values. Azure Machine Learning automatically waits when the job is about to finish, to see if there is any pending metric to be persisted. By the time a job is completed in Azure Machine Learning, all metrics are guaranteed to be persisted.
239+
157240
158241
### Log curves or list of values
159242
@@ -170,6 +253,7 @@ client.log_batch(mlflow.active_run().info.run_id,
170253
metrics=[Metric(key="sample_list", value=val, timestamp=int(time.time() * 1000), step=0) for val in list_to_log])
171254
```
172255
256+
173257
## Log images
174258
175259
MLflow supports two ways of logging images. Both ways persist the given image as an artifact inside of the run.
@@ -215,7 +299,7 @@ mlflow.autolog()
215299
> [!TIP]
216300
> You can control what gets automatically logged with autolog. For instance, if you indicate `mlflow.autolog(log_models=False)`, MLflow logs everything but models for you. Such control is useful in cases where you want to log models manually but still enjoy automatic logging of metrics and parameters. Also notice that some frameworks might disable automatic logging of models if the trained model goes beyond specific boundaries. Such behavior depends on the flavor used and we recommend that you view the documentation if this is your case.
217301
218-
## View jobs/runs information with MLflow
302+
## View information about jobs or runs with MLflow
219303
220304
You can view the logged information using MLflow through the [MLflow.entities.Run](https://mlflow.org/docs/latest/python_api/mlflow.entities.html#mlflow.entities.Run) object:
221305
@@ -258,7 +342,7 @@ file_path = client.download_artifacts("<RUN_ID>", path="feature_importance_weigh
258342
259343
For more information, please refer to [Getting metrics, parameters, artifacts and models](how-to-track-experiments-mlflow.md#get-metrics-parameters-artifacts-and-models).
260344
261-
## View jobs/runs information in the studio
345+
## View information about jobs or runs in the studio
262346
263347
You can browse completed job records, including logged metrics, in the [Azure Machine Learning studio](https://ml.azure.com).
264348
@@ -294,7 +378,7 @@ For jobs training on multi-compute clusters, logs are present for each IP node.
294378
295379
Azure Machine Learning logs information from various sources during training, such as AutoML or the Docker container that runs the training job. Many of these logs aren't documented. If you encounter problems and contact Microsoft support, they might be able to use these logs during troubleshooting.
296380
297-
## Next steps
381+
## Related content
298382
299383
* [Train ML models with MLflow and Azure Machine Learning](how-to-train-mlflow-projects.md)
300384
* [Migrate from SDK v1 logging to MLflow tracking](reference-migrate-sdk-v1-mlflow-tracking.md)

0 commit comments

Comments
 (0)