Skip to content

Commit 58d3248

Browse files
authored
Merge pull request #538 from v-thepet/mflow-artifacts
Freshness 7 - 180 days freshness updates
2 parents aee09d8 + a411c38 commit 58d3248

File tree

2 files changed

+54
-56
lines changed

2 files changed

+54
-56
lines changed
Lines changed: 54 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,32 @@
11
---
2-
title: From artifacts to models in MLflow
2+
title: Artifacts and models in MLflow
33
titleSuffix: Azure Machine Learning
4-
description: Learn how MLflow uses the concept of models instead of artifacts to represent your trained models and enable a streamlined path to deployment.
4+
description: Learn how MLflow uses the concept of models instead of artifacts to represent trained models and enable a streamlined path to deployment.
55
services: machine-learning
66
author: msakande
77
ms.author: mopeakande
8-
ms.reviewer: fasantia
8+
ms.reviewer: cacrest
99
ms.service: azure-machine-learning
1010
ms.subservice: mlops
11-
ms.date: 12/20/2023
11+
ms.date: 09/30/2024
1212
ms.topic: conceptual
13-
ms.custom: cliv2, sdkv2
13+
ms.custom: cliv2, sdkv2, FY25Q1-Linter
14+
#Customer intent: As a data scientist, I want to understand MLflow artifacts and models so I can use MLflow models to enable streamlined deployment workflows.
1415
---
1516

16-
# From artifacts to models in MLflow
17+
# Artifacts and models in MLflow
1718

18-
The following article explains the differences between an MLflow artifact and an MLflow model, and how to transition from one to the other. It also explains how Azure Machine Learning uses the concept of an MLflow model to enable streamlined deployment workflows.
19+
This article explains MLflow artifacts and MLflow models, and how MLflow models differ from other artifacts. The article also explains how Azure Machine Learning uses the characteristics of an MLflow model to enable streamlined deployment workflows.
1920

20-
## What's the difference between an artifact and a model?
21+
## Artifacts and models
2122

22-
If you're not familiar with MLflow, you might not be aware of the difference between logging artifacts or files vs. logging MLflow models. There are some fundamental differences between the two:
23+
In MLflow, there are some fundamental differences between logging simple file artifacts and logging MLflow models.
2324

2425
### Artifact
2526

26-
An _artifact_ is any file that's generated (and captured) from an experiment's run or job. An artifact could represent a model serialized as a pickle file, the weights of a PyTorch or TensorFlow model, or even a text file containing the coefficients of a linear regression. Some artifacts could also have nothing to do with the model itself; rather, they could contain configurations to run the model, or preprocessing information, or sample data, and so on. Artifacts can come in various formats.
27+
An artifact is any file generated and captured from an experiment's run or job. An artifact could be a model serialized as a pickle file, the weights of a PyTorch or TensorFlow model, or a text file containing the coefficients of a linear regression. Some artifacts have nothing to do with the model itself but contain run configurations, preprocessing information, or sample data. Artifacts can have various formats.
2728

28-
You might have been logging artifacts already:
29+
The following example logs a file artifact.
2930

3031
```python
3132
filename = 'model.pkl'
@@ -37,34 +38,33 @@ mlflow.log_artifact(filename)
3738

3839
### Model
3940

40-
A _model_ in MLflow is also an artifact. However, we make stronger assumptions about this type of artifact. Such assumptions provide a clear contract between the saved files and what they mean. When you log your models as artifacts (simple files), you need to know what the model builder meant for each of those files so as to know how to load the model for inference. On the contrary, MLflow models can be loaded using the contract specified in the [The MLmodel format](concept-mlflow-models.md#the-mlmodel-format).
41+
An MLflow model is an artifact for which you make stronger assumptions that provide a clear contract between the saved files and what they mean. If, however, you log your model's files simply as artifacts, you need to know what each of the files mean and how to load them for inference.
4142

42-
In Azure Machine Learning, logging models has the following advantages:
43-
44-
* You can deploy them to real-time or batch endpoints without providing a scoring script or an environment.
45-
* When you deploy models, the deployments automatically have a swagger generated, and the __Test__ feature can be used in Azure Machine Learning studio.
46-
* You can use the models directly as pipeline inputs.
47-
* You can use the [Responsible AI dashboard](how-to-responsible-ai-dashboard.md) with your models.
48-
49-
You can log models by using the MLflow SDK:
43+
You can log MLflow models by using the MLflow SDK, for example:
5044

5145
```python
5246
import mlflow
5347
mlflow.sklearn.log_model(sklearn_estimator, "classifier")
5448
```
5549

50+
Logging MLflow models in Azure Machine Learning has the following advantages:
51+
52+
- You can deploy MLflow models to real-time or batch endpoints without providing a scoring script or an environment.
53+
- When you deploy MLflow models, the deployments automatically generate a swagger file, so you can use the **Test** feature in Azure Machine Learning studio.
54+
- You can use MLflow models directly as pipeline inputs.
55+
- You can use the [Responsible AI dashboard](how-to-responsible-ai-dashboard.md) with MLflow models.
5656

5757
## The MLmodel format
5858

59-
MLflow adopts the MLmodel format as a way to create a contract between the artifacts and what they represent. The MLmodel format stores assets in a folder. Among these assets, there's a file named `MLmodel`. This file is the single source of truth about how a model can be loaded and used.
59+
For models logged as simple artifact files, you need to know what the model builder intended for each file before you can load the model for inference. But for MLflow models, you load the model by using the *MLmodel format* to specify the contract between the artifacts and what they represent.
6060

61-
The following screenshot shows a sample MLflow model's folder in the Azure Machine Learning studio. The model is placed in a folder called `credit_defaults_model`. There is no specific requirement on the naming of this folder. The folder contains the `MLmodel` file among other model artifacts.
61+
The MLmodel format stores assets in a folder that has no specific naming requirement. Among the assets is a file named *MLmodel* that's the single source of truth for how to load and use the model.
6262

63-
:::image type="content" source="media/concept-mlflow-models/mlflow-mlmodel.png" alt-text="A screenshot showing assets of a sample MLflow model, including the MLmodel file." lightbox="media/concept-mlflow-models/mlflow-mlmodel.png":::
63+
The following image shows an MLflow model folder called *credit_defaults_model* in Azure Machine Learning studio. The folder contains the *MLmodel* file and other model artifacts.
6464

65-
The following code is an example of what the `MLmodel` file for a computer vision model trained with `fastai` might look like:
65+
:::image type="content" source="media/concept-mlflow-models/mlflow-mlmodel.png" alt-text="A screenshot showing assets of a sample MLflow model, including the MLmodel file." lightbox="media/concept-mlflow-models/mlflow-mlmodel.png":::
6666

67-
__MLmodel__
67+
The following example shows an *MLmodel* file for a computer vision model trained with `fastai`:
6868

6969
```yaml
7070
artifact_path: classifier
@@ -92,9 +92,11 @@ signature:
9292
9393
### Model flavors
9494
95-
Considering the large number of machine learning frameworks available to use, MLflow introduced the concept of _flavor_ as a way to provide a unique contract to work across all machine learning frameworks. A flavor indicates what to expect for a given model that's created with a specific framework. For instance, TensorFlow has its own flavor, which specifies how a TensorFlow model should be persisted and loaded. Because each model flavor indicates how to persist and load the model for a given framework, the MLmodel format doesn't enforce a single serialization mechanism that all models must support. This decision allows each flavor to use the methods that provide the best performance or best support according to their best practices—without compromising compatibility with the MLmodel standard.
95+
Considering the large number of machine learning frameworks available, MLflow introduced the concept of *flavor* as a way to provide a unique contract for all machine learning frameworks. A flavor indicates what to expect for a given model created with a specific framework. For instance, TensorFlow has its own flavor, which specifies how to persist and load a TensorFlow model.
9696
97-
The following code is an example of the `flavors` section for an `fastai` model.
97+
Because each model flavor indicates how to persist and load the model for a given framework, the MLmodel format doesn't enforce a single serialization mechanism that all models must support. Therefore, each flavor can use the methods that provide the best performance or best support according to their best practices, without compromising compatibility with the MLmodel standard.
98+
99+
The following example shows the `flavors` section for an `fastai` model.
98100

99101
```yaml
100102
flavors:
@@ -110,18 +112,16 @@ flavors:
110112

111113
### Model signature
112114

113-
A [model signature in MLflow](https://www.mlflow.org/docs/latest/models.html#model-signature) is an important part of the model's specification, as it serves as a data contract between the model and the server running the model. A model signature is also important for parsing and enforcing a model's input types at deployment time. If a signature is available, MLflow enforces input types when data is submitted to your model. For more information, see [MLflow signature enforcement](https://www.mlflow.org/docs/latest/models.html#signature-enforcement).
115+
An MLflow [model signature](https://www.mlflow.org/docs/latest/models.html#model-signature) is an important part of the model specification, because it serves as a data contract between the model and the server running the model. A model signature is also important for parsing and enforcing a model's input types at deployment time. If a signature is available, MLflow enforces the input types when data is submitted to your model. For more information, see [MLflow signature enforcement](https://www.mlflow.org/docs/latest/models.html#signature-enforcement).
114116

115-
Signatures are indicated when models get logged, and they're persisted in the `signature` section of the `MLmodel` file. The **Autolog** feature in MLflow automatically infers signatures in a best effort way. However, you might have to log the models manually if the inferred signatures aren't the ones you need. For more information, see [How to log models with signatures](https://www.mlflow.org/docs/latest/models.html#how-to-log-models-with-signatures).
117+
Signatures are indicated at the time that models are logged, and are persisted in the `signature` section of the *MLmodel* file. The **Autolog** feature in MLflow automatically makes a best effort to infer signatures. However, you can log models manually if the inferred signatures aren't the ones you need. For more information, see [How to log models with signatures](https://www.mlflow.org/docs/latest/models.html#how-to-log-models-with-signatures).
116118

117119
There are two types of signatures:
118120

119-
* **Column-based signature**: This signature operates on tabular data. For models with this type of signature, MLflow supplies `pandas.DataFrame` objects as inputs.
120-
* **Tensor-based signature**: This signature operates with n-dimensional arrays or tensors. For models with this signature, MLflow supplies `numpy.ndarray` as inputs (or a dictionary of `numpy.ndarray` in the case of named-tensors).
121-
122-
The following example corresponds to a computer vision model trained with `fastai`. This model receives a batch of images represented as tensors of shape `(300, 300, 3)` with the RGB representation of them (unsigned integers). The model outputs batches of predictions (probabilities) for two classes.
121+
- **Column-based signatures** operate on tabular data. For models with this type of signature, MLflow supplies `pandas.DataFrame` objects as inputs.
122+
- **Tensor-based signatures** operate with n-dimensional arrays or tensors. For models with this signature, MLflow supplies `numpy.ndarray` as inputs, or a dictionary of `numpy.ndarray` for named tensors.
123123

124-
__MLmodel__
124+
The following example shows the `signature` section for a computer vision model trained with `fastai`. This model receives a batch of images represented as tensors of shape `(300, 300, 3)` with their RGB representation as unsigned integers. The model outputs batches of predictions as probabilities for two classes.
125125

126126
```yaml
127127
signature:
@@ -136,15 +136,13 @@ signature:
136136
```
137137

138138
> [!TIP]
139-
> Azure Machine Learning generates a swagger file for a deployment of an MLflow model with a signature available. This makes it easier to test deployments using the Azure Machine Learning studio.
139+
> Azure Machine Learning generates a swagger file for a deployment of an MLflow model that has an available signature. This file makes it easier to test deployments using Azure Machine Learning studio.
140140

141141
### Model environment
142142

143-
Requirements for the model to run are specified in the `conda.yaml` file. MLflow can automatically detect dependencies or you can manually indicate them by calling the `mlflow.<flavor>.log_model()` method. The latter can be useful if the libraries included in your environment aren't the ones you intended to use.
143+
Requirements for the model to run are specified in the *conda.yaml* file. MLflow can automatically detect dependencies, or you can manually indicate them by calling the `mlflow.<flavor>.log_model()` method. Calling the method can be useful if the libraries that MLflow included in your environment aren't the ones you intended to use.
144144

145-
The following code is an example of an environment used for a model created with the `fastai` framework:
146-
147-
__conda.yaml__
145+
The following *conda.yaml* example shows an environment for a model created with the `fastai` framework:
148146

149147
```yaml
150148
channels:
@@ -165,35 +163,35 @@ dependencies:
165163
name: mlflow-env
166164
```
167165

168-
> [!NOTE]
169-
> __What's the difference between an MLflow environment and an Azure Machine Learning environment?__
170-
>
171-
> While an _MLflow environment_ operates at the level of the model, an _Azure Machine Learning environment_ operates at the level of the workspace (for registered environments) or jobs/deployments (for anonymous environments). When you deploy MLflow models in Azure Machine Learning, the model's environment is built and used for deployment. Alternatively, you can override this behavior with the [Azure Machine Learning CLI v2](concept-v2.md) and deploy MLflow models using a specific Azure Machine Learning environment.
166+
>[!NOTE]
167+
>An MLflow environment operates at the level of the model, but an Azure Machine Learning environment operates at the workspace level for registered environments or the jobs/deployments level for anonymous environments. When you deploy MLflow models, Azure Machine Learning builds the model environment and uses it for deployment. You can use the [Azure Machine Learning CLI](concept-v2.md) to override this behavior and deploy MLflow models to a specific Azure Machine Learning environment.
172168

173169
### Predict function
174170

175-
All MLflow models contain a `predict` function. **This function is called when a model is deployed using a no-code-deployment experience**. What the `predict` function returns (for example, classes, probabilities, or a forecast) depend on the framework (that is, the flavor) used for training. Read the documentation of each flavor to know what they return.
171+
All MLflow models contain a `predict` function, which is called when the model is deployed by using a no-code deployment. What the `predict` function returns, for example classes, probabilities, or a forecast, depends on the framework or flavor used for training. The documentation of each flavor describes what it returns.
176172

177-
In same cases, you might need to customize this `predict` function to change the way inference is executed. In such cases, you need to [log models with a different behavior in the predict method](how-to-log-mlflow-models.md#logging-models-with-a-different-behavior-in-the-predict-method) or [log a custom model's flavor](how-to-log-mlflow-models.md#logging-custom-models).
173+
You can customize the `predict` function to change the way inference is executed. You can either [log models with a different behavior](how-to-log-mlflow-models.md#logging-models-with-a-different-behavior-in-the-predict-method), or [log a custom model flavor](how-to-log-mlflow-models.md#logging-custom-models).
178174

179175
## Workflows for loading MLflow models
180176

181-
You can load models that were created as MLflow models from several locations, including:
177+
You can load MLflow models from the following locations:
178+
179+
- Directly from the run where the models were logged
180+
- From the file system where the models are saved
181+
- From the model registry where the models are registered
182182

183-
- directly from the run where the models were logged
184-
- from the file system where they models are saved
185-
- from the model registry where the models are registered.
183+
MLflow provides a consistent way to load these models regardless of location.
186184

187-
MLflow provides a consistent way to load these models regardless of the location.
185+
There are two workflows for loading models:
188186

189-
There are two workflows available for loading models:
187+
- **Load back the same object and types that were logged.** You can load models using the MLflow SDK and obtain an instance of the model with types belonging to the training library. For example, an Open Neural Network Exchange (ONNX) model returns a `ModelProto`, while a decision tree model trained with `scikit-learn` returns a `DecisionTreeClassifier` object. Use `mlflow.<flavor>.load_model()` to load back the same model object and types that were logged.
190188

191-
* **Load back the same object and types that were logged:** You can load models using the MLflow SDK and obtain an instance of the model with types belonging to the training library. For example, an ONNX model returns a `ModelProto` while a decision tree model trained with scikit-learn returns a `DecisionTreeClassifier` object. Use `mlflow.<flavor>.load_model()` to load back the same model object and types that were logged.
189+
- **Load back a model for running inference.** You can load models using the MLflow SDK and get a wrapper that has a guaranteed `predict` function. It doesn't matter which flavor you use, because every MLflow model has a `predict` function.
192190

193-
* **Load back a model for running inference:** You can load models using the MLflow SDK and obtain a wrapper where MLflow guarantees that there will be a `predict` function. It doesn't matter which flavor you're using, every MLflow model has a `predict` function. Furthermore, MLflow guarantees that this function can be called by using arguments of type `pandas.DataFrame`, `numpy.ndarray`, or `dict[string, numpyndarray]` (depending on the signature of the model). MLflow handles the type conversion to the input type that the model expects. Use `mlflow.pyfunc.load_model()` to load back a model for running inference.
191+
MLflow guarantees that you can call this function by using arguments of type `pandas.DataFrame`, `numpy.ndarray`, or `dict[string, numpyndarray]`, depending on the model signature. MLflow handles the type conversion to the input type that the model expects. Use `mlflow.pyfunc.load_model()` to load back a model for running inference.
194192

195193
## Related content
196194

197-
* [Configure MLflow for Azure Machine Learning](how-to-use-mlflow-configure-tracking.md)
198-
* [How to log MLFlow models](how-to-log-mlflow-models.md)
199-
* [Guidelines for deploying MLflow models](how-to-deploy-mlflow-models.md)
195+
- [Configure MLflow for Azure Machine Learning](how-to-use-mlflow-configure-tracking.md)
196+
- [How to log MLFlow models](how-to-log-mlflow-models.md)
197+
- [Guidelines for deploying MLflow models](how-to-deploy-mlflow-models.md)
-17.1 KB
Loading

0 commit comments

Comments
 (0)