Skip to content

Commit c4b8438

Browse files
committed
Continuize: Update documentation
1 parent 63f4b63 commit c4b8438

File tree

4 files changed

+29
-15
lines changed

4 files changed

+29
-15
lines changed

doc/visual-programming/source/widgets/data/continuize.md

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,45 +11,59 @@ Turns discrete variables (attributes) into numeric ("continuous") dummy variable
1111

1212
- Data: transformed data set
1313

14-
The **Continuize** widget receives a data set in the input and outputs the same data set in which the discrete variables (including binary variables) are replaced with continuous ones.
14+
The **Continuize** widget receives a data set in the input and outputs the same data set in which some or all categorical variables are replaced with continuous ones and numeric variables are scaled.
1515

1616
![](images/Continuize-stamped.png)
1717

18-
1. Define the treatment of non-binary categorical variables.
18+
1. Select a categorical attribute to define its specific treatmen, or click the "Deafult" option above to set the default treatment for all categorical attributes without specific settings.
1919

20-
Examples in this section will assume that we have a discrete attribute status with the values low, middle and high, listed in that order. Options for their transformation are:
20+
Multiple attributes can be chosen.
21+
22+
2. Define the treatment of categorical variables.
23+
24+
Examples in this section will assume that we have a categorical attribute *status* with values *low*, *middle* and *high*, listed in that order. Options for their transformation are:
25+
26+
- **Use default setting**: use the default treatment.
27+
28+
- **Leave categorical**: leave the attribute as it is.
2129

2230
- **First value as base**: a N-valued categorical variable will be transformed into N-1 numeric variables, each serving as an indicator for one of the original values except for the base value. The base value is the first value in the list. By default, the values are ordered alphabetically; their order can be changed in [Edit Domain](../data/editdomain).
2331

2432
In the above case, the three-valued variable *status* is transformed into two numeric variables, *status=middle* with values 0 or 1 indicating whether the original variable had value *middle* on a particular example, and similarly, *status=high*.
2533

2634
- **Most frequent value as base**: similar to the above, except that the most frequent value is used as a base. So, if the most frequent value in the above example is *middle*, then *middle* is considered as the base and the two newly constructed variables are *status=low* and *status=high*.
2735

28-
- **One attribute per value**: this option constructs one numeric variable per each value of the original variable. In the above case, we would get variables *status=low*, *status=middle* and *status=high*.
36+
- **One-hot encoding**: this option constructs one numeric variable per each value of the original variable. In the above case, we would get variables *status=low*, *status=middle* and *status=high*.
2937

30-
- **Ignore multinomial attributes**: removes non-binary categorical variables from the data.
38+
- **Remove if more than 3 values**: removes non-binary categorical variables from the data.
3139

32-
- **Treat as ordinal**: converts the variable into a single numeric variable enumerating the original values. In the above case, the new variable would have the value of 0 for *low*, 1 for *middle* and 2 for *high*. Again note that the order of values can be set in [Edit Domain](../data/editdomain).
33-
34-
- **Divide by number of values**: same as above, except that values are normalized into range 0-1. In our example, the values of the new variable would be 0, 0.5 and 1.
40+
- **Remove**: removes the attribute.
3541

36-
2. Define the treatment of continuous attributes. Besised the option to *Leave them as they are*, we can *Normalize by span*, which will subtract the lowest value found in the data and divide by the span, so all values will fit into [0, 1]. Option *Normalize by standard deviation* subtracts the average and divides by the standard deviation.
42+
- **Treat as ordinal**: converts the variable into a single numeric variable enumerating the original values. In the above case, the new variable would have the value of 0 for *low*, 1 for *middle* and 2 for *high*. Again note that the order of values can be set in [Edit Domain](../data/editdomain).
3743

38-
3. Define the treatment of class attributes (outcomes, targets). Besides leaving it as it is, the available options mirror those for multinomial attributes, except for those that would split the outcome into multiple outcome variables.
44+
- **Treat as normalized ordinal**: same as above, except that values are normalized into range 0-1. In our example, the values of the new variable would be 0, 0.5 and 1.
3945

40-
4. This option defines the ranges of new variables. In the above text, we supposed the range *from 0 to 1*.
46+
3. Select attributes to set individual treatments or click "Default" to set the default treatment for numeric attributes.
4147

42-
5. Produce a report.
48+
4. Define the treatment of numeric attributes.
4349

44-
6. If *Apply automatically* is ticked, changes are committed automatically. Otherwise, you have to press *Apply* after each change.
50+
- **Use default setting**: use the general default.
51+
- **Leave as it is**: do not change anything.
52+
- **Standardize**: subtract the mean and divide by the standard deviation (not available for sparse data).
53+
- **Center**: subtract the mean (not available for sparse data).
54+
- **Scale**: divide by standard deviation.
55+
- **Normalize to interval [-1, 1]**: linearly scale the values into interval [-1, 1] (not available for sparse data)
56+
- **Normalize to interval [0, 1]**: linearly scale the values into interval [0, 1] (not available for sparse data)
57+
58+
5. If checked, the class attribute is converted in the same fashion as categorical attributes that are treated as ordinal (see above).
4559

4660
Examples
4761
--------
4862

49-
First, let's see what is the output of the **Continuize** widget. We feed the original data (the *Heart disease* data set) into the [Data Table](../data/datatable) and see how they look like. Then we continuize the discrete values and observe them in another [Data Table](../data/datatable).
63+
First, let's see what is the output of the **Continuize** widget. We feed the original data (the *Heart disease* data set) into the [Data Table](../data/datatable) and see how they look like. Then we continuize the discrete values using various options and observe them in another [Data Table](../data/datatable).
5064

5165
![](images/Continuize-Example1.png)
5266

53-
In the second example, we show a typical use of this widget - in order to properly plot the linear projection of the data, discrete attributes need to be converted to continuous ones and that is why we put the data through the **Continuize** widget before drawing it. The attribute "*chest pain*" originally had four values and was transformed into three continuous attributes; similar happened to gender, which was transformed into a single attribute "*gender=female*".
67+
In the second example, we show a typical use of this widget - in order to properly plot the linear projection of the data, discrete attributes need to be converted to continuous ones and that is why we put the data through the **Continuize** widget before drawing it. Gender, for instance, is transformed into two attributes "*gender=female*" and *gender=male*.
5468

5569
![](images/Continuize-Example2.png)
319 KB
Loading
-38.7 KB
Loading
136 KB
Loading

0 commit comments

Comments
 (0)