Skip to content

Commit 89010a4

Browse files
Init Model page
1 parent 22d5438 commit 89010a4

File tree

2 files changed

+64
-1
lines changed

2 files changed

+64
-1
lines changed

md-docs/user_guide/model.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,70 @@
11
# Model
22

3+
In the ML Cube Platform, a Model is a representation of the actual machine learning model used for making predictions. The data used
4+
for its training usually represent the reference data distribution, while production data comprises the data on which the model
5+
performs inference.
36

7+
A Model is uniquely associated with a [Task] and it can be created both through the WebApp and the Python SDK. Currently, we only support one model
8+
per Task.
49

10+
A Model is defined by a name and a version. The version is updated whenever the model is retrained, allowing to
11+
track the latest version of the model and the data used for its training. When predictions are uploaded to the platform,
12+
the model version needs to be appropriately specified, following the guidelines in the [Data Schema] page, to ensure that the
13+
predictions are associated to the correct model version.
14+
15+
!!! note
16+
You don't need to upload the **real** model on the Platform. We only require its training data and predictions.
17+
The entity you create on the Platform serves more as a placeholder for the model. For this reason,
18+
the ML cube Platform is considered *model agnostic*.
19+
20+
21+
### RAG Model
22+
23+
RAG Tasks represent an exception to the model framework presented before. In this type of Tasks, the model
24+
is a Large Language Model (LLM), that is used to generate responses to user queries. The model is not trained on a specific dataset
25+
but is rather a pre-trained model that is fine-tuned on the user's data, which means that the classic process of training and
26+
retraining does not apply.
27+
28+
To maintain a coherent Model definition across task types, the RAG model is also represented as a Model,
29+
but an update of its version represents an update of the reference data distribution and not necessarily
30+
an update of the model itself. Moreover, most of the attributes which will be described in the following sections
31+
are not applicable, as they are related to the retraining module, which is not usable in RAG tasks.
32+
33+
### Probabilistic output
34+
35+
When creating a model, you can specify if you want to provide also the probabilistic output of the model along with the predictions.
36+
The probabilistic output represents the probability or confidence score associated with the model's predictions. If provided,
37+
the ML cube Platform will use this information to compute additional metrics and insights.
38+
39+
It is optional and currently supported only for Classification and RAG tasks. If specified, the probabilistic output must be provided
40+
as a new column in the predictions file, following the guidelines in the [Data Schema] page.
41+
42+
### Metric
43+
44+
A Model Metric represents the evaluation metric used to assess the performance of the model.
45+
It can both represent a performance or an error. The chosen metric will be used in the various views of the platform to
46+
provide insights on the model's performance. The available options are:
47+
48+
- `Accuracy`, for classification tasks
49+
- `RMSE`, for regression tasks
50+
- `R2`, for regression tasks
51+
- `Average Precision`, for Object Detection tasks
52+
53+
RAG tasks have no metric, as in that case the model is an LLM for which classic definitions of metrics are not applicable.
54+
55+
### Suggestion Type
56+
57+
The Suggestion Type represents the type of suggestion that the ML cube Platform should provide when computing the
58+
[retraining dataset](modules/retraining.md#retraining-dataset). The available options are:
59+
60+
- `Sample Weights`: each sample uploaded in ML cube Platform is assigned a weight that can be used as sample weight in a weighted loss function.
61+
The higher the weight, the greater the importance of the sample for the new retraining.
62+
- `Resampled Dataset`: a list of sample ids (using data schema column object with role ID) is provided indicating which data form the retraining dataset.
63+
This format can be used when the training procedure does not support weighted loss or when a fixed size retraining dataset is preferred.
64+
Note that samples ids can appear more than once: this happens when a sample is particularly important for the new retraining.
65+
66+
[Task]: task.md
67+
[Data Schema]: data_schema.md
568

669
[//]: # ()
770
[//]: # ()

md-docs/user_guide/monitoring/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ stateDiagram-v2
139139
```
140140

141141

142-
Notice that a Drift OFF event can either bring the entity back to the `OK` status or to the `WARNING` status,
142+
Notice that a Drift Off event can either bring the entity back to the `OK` status or to the `WARNING` status,
143143
depending on the velocity of the change and the monitoring algorithm's sensitivity. The same applies
144144
to the Drift ON events, which can both happen when the entity is in the `WARNING` status or in the `OK` status.
145145

0 commit comments

Comments
 (0)