You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: md-docs/user_guide/segment.md
+21-10Lines changed: 21 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,16 @@
1
1
2
2
# Segment
3
3
4
-
A Segment is a subset of the population, created according to a specific set of rules. A [Task] can include several Segments, each defined by its own rule and monitored in parallel alongside the whole population. The objective of a Segment is to allow the analysis of specific groups of data, whose variations might go unnoticed if only the whole population is monitored.
4
+
A Segment is a subset of the data distribution that identifies a sub-domain inside the data.
5
+
It is defined by a set of rules over data dimensions and metadata.
6
+
A [Task] can include several Segments and there are no constrains about how they are specified.
7
+
Indeed, two Segments can have some intersections in the data space.
5
8
9
+
When Segments are specified for a [Task], monitoring is performed both on the whole data, called _all population_, and for each Segment.
10
+
The objective of a Segment is to allow the analysis of specific groups of data, whose variations might go unnoticed if only the whole population is monitored.
6
11
7
-
Segments, similarly to the [Data schema], must be defined before sending any data to the Platform. They must to be created all at once, as they can't be modified upon creation. Additionally, their definition needs to happen
8
-
after the creation of the Data Schema, as the rules for the Segment are based on the columns defined there.
12
+
Segments, similarly to the [Data schema], must be defined before sending any data to the Platform.
13
+
They must to be created all at once, as they can't be modified upon creation. Additionally, their definition needs to happen after the creation of the Data Schema, as the rules for the Segment are based on the columns defined there.
9
14
10
15
11
16
## Segment Structure
@@ -21,14 +26,15 @@ Segments can be created both through the Web App and the SDK.
21
26
22
27
## Segment Rules
23
28
24
-
A rule is a condition that a sample must satisfy to be part of a Segment. Each Segment can have multiple roles, which are applied in AND between them.
29
+
A rule is a condition over a single data dimension that a specific sample must match to be considered part of the Segment.
30
+
Each Segment has from one to several rules, which are applied in AND between them.
25
31
A rule is defined by the following fields:
26
32
27
33
| Field | Description |
28
34
| --------- | ------- |
29
35
| Column name | The name of the column in the Data Schema that the rule is applied to. A rule can be applied only on columns of role INPUT, TARGET and METADATA|
30
36
| Operator | The operator defining the rule. It can be either `IN` or `OUT`|
31
-
| Values | This field can have 2 possible meaning, according to the data type of the column of the rule: <br><ul><li>The data type is float: values is a list of intervals that defines the ranges over which the operator is applied. The values defining the interval are always included in the interval.</li><li>The data type is categorical or string: values is a list containing the exact values over which the operator is applied</li></ul> |
37
+
| Values | This field can have two possible meaning, according to the data type of the column specified in the rule: <br><ul><li>The data type is float: Values is a series of ranges [a, b]that define the numeric intervals over which the operator is applied. The range is closed, meaning that the extremes are always considered in it. When operator is `IN`, the ranges are in OR, whereas, when the operator is `OUT` they are in AND.</li><li>The data type is categorical or string: Values is a list which elements must match the content of the column. When operator is `IN`, the column value must be one of the specified elements, while, when operator is `OUT` it must not be one of them. </li></ul> |
32
38
33
39
## Examples
34
40
@@ -76,7 +82,7 @@ This segment would include the samples with `Sample ID` equal to `id_0`, `id_1`
76
82
)
77
83
```
78
84
79
-
- A Segment that includes all samples where the value of the column `X_0` is between greater or equal than 13 and the value of the column `X_1` is strictly less than 24:
85
+
- A Segment that includes all samples where the value of the column `X_0` is greater or equal than 13 and the value of the column `X_1` is strictly less than 24:
80
86
81
87
| Field | Value |
82
88
| --------- | ------- |
@@ -103,7 +109,7 @@ This segment would include the sample with `Sample ID` equal to `id_3`.
103
109
NumericSegmentRule(
104
110
column_name='X_1',
105
111
operator=SegmentOperator.IN,
106
-
values=[SegmentRuleNumericRange(end_value=22)]
112
+
values=[SegmentRuleNumericRange(end_value=23)]
107
113
)
108
114
]
109
115
)
@@ -137,7 +143,7 @@ This segment would include the samples with `Sample ID` equal to `id_0` and `id_
0 commit comments