Skip to content

Commit fa4e011

Browse files
committed
docs
1 parent 76b5ad5 commit fa4e011

File tree

2 files changed

+37
-32
lines changed

2 files changed

+37
-32
lines changed

md-docs/user_guide/modules/monitoring.md

Lines changed: 36 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,35 @@
22

33
Data and model monitoring is fundamental to guarantee performance of your ML models.
44
With ML cube Platform you can log and monitor different aspects of your ML Task by uploading data batches.
5+
Before entering in details about what ML cube Platform monitors and analyses, it is worth to mention how data are represented.
6+
Data are shared as _batches_
57

68
## Data Taxonomy
7-
A Batch of data is composed of four types of data:
89

9-
- **input:** set of input features the AI model uses to predict the output.
10-
ML cube Platform uses the input data that comes at the end of the processing data pipeline and not the raw data.
11-
This is due to the fact that ML cube Platform detects drifts in what the AI model uses and not in the general data the customer has.
10+
A Batch of data is composed of four types categories:
11+
12+
- **input:** set of input features the AI model uses to predict the output.
13+
ML cube Platform uses the input data that comes at the end of the processing data pipeline and not the raw data.
14+
This is due to the fact that ML cube Platform detects drifts in what the AI model uses and not in the general data the customer has.
1215
- **target:** target quantity predicted by the AI models.
13-
It is present in the training data but can be not available for production data.
16+
It is present in the training data but can be not available for production data.
1417
- **models' predictions:** predicted target for each AI model in the AI Task.
1518
- **metadata:** additional information that AI models do not use as input but that is important to define the data or the samples.
16-
Mandatory for this category are the `sample-id`, a unique identifier for each sample used to avoid confusion and misinterpretation; and the
17-
`sample-timestamp`, a timestamp associated with each sample used for ordering.
18-
Moreover, the User can provide additional data used to segment the data space.
19-
For instance, sensitive information like zip code or country are not used by AI models to prevent bias, however, ML cube Platform can use them to
20-
check and prevent bias in the suggested retraining dataset or to perform segmented drift detection.
19+
Mandatory for this category are the `sample-id`, a unique identifier for each sample used to avoid confusion and misinterpretation; and the
20+
`sample-timestamp`, a timestamp associated with each sample used for ordering.
21+
Moreover, the User can provide additional data used to segment the data space.
22+
For instance, sensitive information like zip code or country are not used by AI models to prevent bias, however, ML cube Platform can use them to
23+
check and prevent bias in the suggested retraining dataset or to perform segmented drift detection.
2124

2225
## Data Categories
26+
2327
ML cube Platform are present three categories of data:
2428

2529
- **Reference:** represents the dataset used to train the model.
26-
Each model version has a reference dataset.
27-
Detection algorithms use reference data during their initialization.
30+
Each model version has a reference dataset.
31+
Detection algorithms use reference data during their initialization.
2832
- **Production:** represents data that comes from the production environment in which the AI model is operating.
29-
Detection algorithms analyze production data to detect the presence of drifts.
33+
Detection algorithms analyze production data to detect the presence of drifts.
3034
- **Historical:** represents additional data that ML cube Platform can use to define the retraining dataset after a drift.
3135

3236
Each data category is uploaded to the application with its specific API call, however, they share the same structure.
@@ -44,35 +48,35 @@ That's why during production data arrive at different times, usually input and p
4448

4549
Delta Energy company trained its models using the data in the year 2022 and used the algorithms starting from the 2023. This means that the data in the 2022 are the reference data and every data from the january first 2023 are considered as production data. Data previous 2022 are historical data instead.
4650

47-
4851
## Drift Detection
52+
4953
ML cube Platform provides a set of Detectors for each AI Task.
5054
These detectors are used to monitor the task at different levels.
51-
The choice over the types of detectors to be instantiated depends on the type of task (classification or regression) and on the type of data available for that task (input, output, model predictions).
55+
The choice over the types of detectors to be instantiated depends on the type of task (classification or regression) and on the type of data available for that task (input, output, model predictions).
5256

53-
There are mainly two classes of detectors:
57+
There are mainly two classes of detectors:
5458

55-
- **Data Detectors:** they take into account data associated with the task. They may be *input only*
56-
data or *input and ground truth* data. These detectors are independent from the models trained on the
57-
task as they do not either consider their predictions or performances. These detectors are responsible for the identification of
58-
input and concept drifts. According to the type of the used detector, changes in data are either monitored at feature
59-
level or using a multivariate monitoring strategy.
59+
- **Data Detectors:** they take into account data associated with the task. They may be _input only_
60+
data or _input and ground truth_ data. These detectors are independent from the models trained on the
61+
task as they do not either consider their predictions or performances. These detectors are responsible for the identification of
62+
input and concept drifts. According to the type of the used detector, changes in data are either monitored at feature
63+
level or using a multivariate monitoring strategy.
6064
- **Model Detectors:** they monitor the performances associated with the models related to the task.
61-
In cases where the user has multiple models trained for a single task, a single detector is created for each model.
65+
In cases where the user has multiple models trained for a single task, a single detector is created for each model.
6266

63-
Each detector is initially created using **Reference data** provided by the user. Every time a new batch of data
67+
Each detector is initially created using **Reference data** provided by the user. Every time a new batch of data
6468
is uploaded, the detectors observe the batch and update their statistics.
65-
Each detector updates its statistics independently from the others and each of them presents a double-level alarm scheme in
69+
Each detector updates its statistics independently from the others and each of them presents a double-level alarm scheme in
6670
order to either signal a **Warning** or a **Drift** for the monitored task.
6771

68-
The detectors may be in three different states:
72+
The detectors may be in three different states:
6973

70-
- **Regular:** the detector is monitoring data that are similar to the reference data,
71-
- **Warning:** the detector has fired a Warning alarm since the data has started to change. From this zone, it is possible
72-
to either go into the Drift status or to go back to the Regular one, depending on the monitored data.
73-
- **Drift:** the detector has fired a Drift alarm and a change has been established between the reference data and the last
74-
ones. After a drift, the detector is usually reset by defining a new set of reference data. The reset process is different
75-
according to what has been monitored by the detector.
74+
- **Regular:** the detector is monitoring data that are similar to the reference data,
75+
- **Warning:** the detector has fired a Warning alarm since the data has started to change. From this zone, it is possible
76+
to either go into the Drift status or to go back to the Regular one, depending on the monitored data.
77+
- **Drift:** the detector has fired a Drift alarm and a change has been established between the reference data and the last
78+
ones. After a drift, the detector is usually reset by defining a new set of reference data. The reset process is different
79+
according to what has been monitored by the detector.
7680

7781
All the alarms generated during this process are shown in the application like **Detection Events** available in the Task homepage or in the Detection page.
78-
You can create automation rules based on those events to be notified on specific channels or start retraining, see [Detection automation rules](../detection_event_rules.md) for more details.
82+
You can create automation rules based on those events to be notified on specific channels or start retraining, see [Detection automation rules](../detection_event_rules.md) for more details.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,3 +135,4 @@ nav:
135135
- api/rest/index.md
136136
- Examples: api/examples.md
137137
- Web App: https://app.platform.mlcube.com/
138+
- Support: https://support.platform.mlcube.com/en/customer/login

0 commit comments

Comments
 (0)