|
1 | 1 | # Model |
2 | 2 |
|
| 3 | +In the ML Cube Platform, a Model is a representation of the actual machine learning model used for making predictions. The data used |
| 4 | +for its training usually represent the reference data distribution, while production data comprises the data on which the model |
| 5 | +performs inference. |
3 | 6 |
|
| 7 | +A Model is uniquely associated with a [Task] and it can be created both through the WebApp and the Python SDK. Currently, we only support one model |
| 8 | +per Task. |
4 | 9 |
|
| 10 | +A Model is defined by a name and a version. The version is updated whenever the model is retrained, allowing to |
| 11 | +track the latest version of the model and the data used for its training. When predictions are uploaded to the platform, |
| 12 | +the model version needs to be appropriately specified, following the guidelines in the [Data Schema] page, to ensure that the |
| 13 | +predictions are associated to the correct model version. |
| 14 | + |
| 15 | +!!! note |
| 16 | + You don't need to upload the **real** model on the Platform. We only require its training data and predictions. |
| 17 | + The entity you create on the Platform serves more as a placeholder for the model. For this reason, |
| 18 | + the ML cube Platform is considered *model agnostic*. |
| 19 | + |
| 20 | + |
| 21 | +### RAG Model |
| 22 | + |
| 23 | +RAG Tasks represent an exception to the model framework presented before. In this type of Tasks, the model |
| 24 | +is a Large Language Model (LLM), that is used to generate responses to user queries. The model is not trained on a specific dataset |
| 25 | +but is rather a pre-trained model that is fine-tuned on the user's data, which means that the classic process of training and |
| 26 | +retraining does not apply. |
| 27 | + |
| 28 | +To maintain a coherent Model definition across task types, the RAG model is also represented as a Model, |
| 29 | +but an update of its version represents an update of the reference data distribution and not necessarily |
| 30 | +an update of the model itself. Moreover, most of the attributes which will be described in the following sections |
| 31 | +are not applicable, as they are related to the retraining module, which is not usable in RAG tasks. |
| 32 | + |
| 33 | +### Probabilistic output |
| 34 | + |
| 35 | +When creating a model, you can specify if you want to provide also the probabilistic output of the model along with the predictions. |
| 36 | +The probabilistic output represents the probability or confidence score associated with the model's predictions. If provided, |
| 37 | +the ML cube Platform will use this information to compute additional metrics and insights. |
| 38 | + |
| 39 | +It is optional and currently supported only for Classification and RAG tasks. If specified, the probabilistic output must be provided |
| 40 | +as a new column in the predictions file, following the guidelines in the [Data Schema] page. |
| 41 | + |
| 42 | +### Metric |
| 43 | + |
| 44 | +A Model Metric represents the evaluation metric used to assess the performance of the model. |
| 45 | +It can both represent a performance or an error. The chosen metric will be used in the various views of the platform to |
| 46 | +provide insights on the model's performance. The available options are: |
| 47 | + |
| 48 | +- `Accuracy`, for classification tasks |
| 49 | +- `RMSE`, for regression tasks |
| 50 | +- `R2`, for regression tasks |
| 51 | +- `Average Precision`, for Object Detection tasks |
| 52 | + |
| 53 | +RAG tasks have no metric, as in that case the model is an LLM for which classic definitions of metrics are not applicable. |
| 54 | + |
| 55 | +### Suggestion Type |
| 56 | + |
| 57 | +The Suggestion Type represents the type of suggestion that the ML cube Platform should provide when computing the |
| 58 | +[retraining dataset](modules/retraining.md#retraining-dataset). The available options are: |
| 59 | + |
| 60 | +- `Sample Weights`: each sample uploaded in ML cube Platform is assigned a weight that can be used as sample weight in a weighted loss function. |
| 61 | + The higher the weight, the greater the importance of the sample for the new retraining. |
| 62 | +- `Resampled Dataset`: a list of sample ids (using data schema column object with role ID) is provided indicating which data form the retraining dataset. |
| 63 | + This format can be used when the training procedure does not support weighted loss or when a fixed size retraining dataset is preferred. |
| 64 | + Note that samples ids can appear more than once: this happens when a sample is particularly important for the new retraining. |
| 65 | + |
| 66 | +[Task]: task.md |
| 67 | +[Data Schema]: data_schema.md |
5 | 68 |
|
6 | 69 | [//]: # () |
7 | 70 | [//]: # () |
|
0 commit comments