Skip to content

Commit 9f4b535

Browse files
committed
first version of task page
1 parent 9e54097 commit 9f4b535

File tree

2 files changed

+111
-1
lines changed

2 files changed

+111
-1
lines changed

md-docs/user_guide/task.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Task
2+
3+
A Task is the third and last organizational entity in ML cube Platform.
4+
A Task represents an ordinary artificial intelligence task like regression, classification, text generation or object detection.
5+
6+
A Task is associated with a [Model] that provides an output from input data and a [Data schema] that describes all the information about the data.
7+
8+
A Task is described by a set of attributes that vary according to its type.
9+
Common attributes for every Task are:
10+
11+
| Attribute | Description |
12+
|--|--|
13+
| Name | Name of the Task, unique for the Project. |
14+
| Tags | Optional customizable list of tags. It is used to better describe the Task and to improve search. |
15+
| Task type | Artificial intelligence type of Task. Possible values are:<br><ul><li>[Regression](task.md#regression)</li><li>[Binary classification](task.md#classification)</li><li>[Multiclass classification](task.md#classification)</li><li>[Multilabel classification](task.md#classification)</li><li>[Retrieval Augmented Generation](task.md#retrieval-augmented-generation)</li><li>[Object Detection](task.md#object-detection)</li></ul>|
16+
| Data structure | Type of input data the Task uses. Possible values are:<br><ul><li>Tabular</li><li>Image</li><li>Text: when data structure is Text, attribute *Text Language* is required.</li><li>Embeddings: input data are arrays that could represent embedding either image or text data. This data structure is used when raw data are not shared with ML cube Platform.</li></ul> |
17+
|Optional target| Boolean value that specifies if the ground truth is always available or not. In some Tasks, the actual value is not present until explicit labeling is done. In this cases, the Task is marked as with optional target so that ML cube Platform works accordingly. |
18+
| Cost info | Optional information about costs that depend on Task Type. |
19+
20+
21+
## Data structure support
22+
|Task type| Tabular | Image | Text | Embedding|
23+
| -- | -- | -- | -- | -- |
24+
| Regression | :material-check: | :material-check: | :material-check: | :material-check: |
25+
| Classification | :material-check: | :material-check: | :material-check: | :material-check: |
26+
| RAG | :material-close: | :material-close: | :material-check: | :material-check: |
27+
| Object Detection | :material-close: | :material-check: | :material-close: | :material-check: |
28+
29+
30+
## Regression
31+
32+
Supervised regression Task with continuous target.
33+
34+
### Cost information
35+
Cost information is expressed by two proportional coefficients $c_{o}$ and $c_{u}$:
36+
37+
- $c_{o}$ is the cost of overestimating the target value, i.e., when $\hat{y} > y$
38+
- $c_{u}$ is the cost of underestimating the target value. i.e., when $\hat{y} < y$
39+
40+
Given a data batch, the mean cost $\bar{C}$ is expressed as
41+
$$
42+
\bar{C} = \frac{\sum_{i | \delta_i < 0} \delta_i \times c_{o} + \sum_{i | \delta_i > 0} \delta_i \times c_{u}}{N}
43+
$$
44+
where $\delta_i = y_i - \hat{y}_i$ is the different between the target and the estimated value.
45+
46+
47+
## Classification
48+
49+
Supervised classification Task with discrete target.
50+
Classification Tasks divides in:
51+
52+
- **Binary:** when then target is a binary variable. For binary classification tasks additional *positive class* attribute must be specified indicating which value is considered as the positive one. For instance, in fraud detection classification task "1" can represent that the sample is a fraud, while "0" when it is not. In that case positive class attribute is "1".
53+
- **Multiclass:** when the target is a categorical variable with more than two possible values but only one value can be assigned.
54+
- **Multilabel:** when the target is an array indicating which of the possible categories are present. In this case, each element can be either 0 or 1, and more than one element of the array can be 1.
55+
56+
### Cost information
57+
Cost information differs from each of the three classification types, however, the concept is similar.
58+
A cost is associated to every misclassification possibility:
59+
60+
- **Binary:**
61+
- $c_{FP}$ is the cost of classifying a negative sample as positive
62+
- $c_{FN}$ is the cost of classifying a positive sample as negative
63+
64+
Given a data batch, the mean cost $\bar{C}$ is expressed as
65+
$$
66+
\bar{C} = \frac{N_{FP} \times c_{FP} + N_{FN} \times c_{FN}}{N}
67+
$$
68+
where $N_{FP}$ and $N_{FN}$ are the number of false positives and false negatives respectively.
69+
70+
- **Multiclass:**
71+
- $c_{k}$ is the cost of misclassifying a sample which actual class is $k$ with another class
72+
73+
Given a data batch, the mean cost $\bar{C}$ is expressed as
74+
$$
75+
\bar{C} = \frac{\sum_{k} N_{k} \times c_{k} }{N}
76+
$$
77+
where $N_{k}$ is the number of misclassified samples of class $k$.
78+
79+
80+
- **Multilabel:**
81+
- $c_{FP}^{k}$ is the cost of classifying a sample as class $k$ when the actual class $k$ is not present
82+
- $c_{FN}^{k}$ is the cost of not classifying a sample as class $k$ when the actual class $k$ is present
83+
84+
Given a data batch, the mean cost $\bar{C}$ is expressed as
85+
$$
86+
\bar{C} = \frac{\sum_{k} N_{FP}^{k} \times c_{FP}^{k} + N_{FN}^{k} \times c_{FN}^{k}}{N}
87+
$$
88+
where $N_{FP}^{k}$ and $N_{FN}^{k}$ are the number of false positives and false negatives of class $k$ respectively
89+
90+
91+
## Retrieval Augmented Generation
92+
93+
Retrieval Augmented Generation is a particular AI task for Text data based on Large Language Models to generate responses of user query using a set of retrieved documents as context to generate a precise and more focused response.
94+
95+
RAG Tasks, do not have a Target therefore, the attribute *optional target* is always set to True.
96+
Moreover, in this Task, the Target is a text as well and the input is composed of two entities:
97+
98+
- User Input: the user query that the model needs to answer
99+
- Retrieved Context: the set of documents the retrieval engine selected to help the model
100+
101+
RAG tasks has additional attribute *context separator* which is string used to separate different retrieved contexts into chunks. Context data is sent as a single string, however, in RAG settings multiple documents can be retrieved. In this case, context separator is used to distinguish them. It is optional since a single context can be provided.
102+
103+
## Object Detection
104+
105+
Object Detection task processes images and provides as output a list of bounding boxes with associated label indicating the type of identified entity.
106+
Therefore, target is a list of four elements tuples indicating the vertex of the box and a string label for the entity type.
107+
108+
[Model]: model.md
109+
[Data schema]: data_schema.md

mkdocs.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ theme:
1717
- navigation.expand
1818
- navigation.footer
1919
- navigation.indexes
20-
- navigation.instant
20+
#- navigation.instant
2121
# - navigation.prune
2222
- navigation.sections
2323
- navigation.expand
@@ -108,6 +108,7 @@ nav:
108108
- user_guide/index.md
109109
- user_guide/company.md
110110
- user_guide/project.md
111+
- user_guide/task.md
111112

112113
- Modules:
113114
- user_guide/modules/index.md

0 commit comments

Comments
 (0)