Skip to content

Commit a89554b

Browse files
authored
Improve manual for data transformations (#648)
`standardize` allows both Z-score normalization (a.k.a. as standardization) and unit range normalization. This can be confusing, so avoid saying "standardization" without more explicit terms.
1 parent 11ac5b5 commit a89554b

File tree

1 file changed

+21
-14
lines changed

1 file changed

+21
-14
lines changed

docs/src/transformations.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,37 +3,44 @@
33
In general, data transformations change raw feature vectors into
44
a representation that is more suitable for various estimators.
55

6-
## Standardization
6+
## Standardization a.k.a Z-score Normalization
77

8-
**Standardization** of dataset is a common requirement for many machine
9-
learning techniques. These techniques might perform poorly if the individual
10-
features do not more or less look like standard normally distributed data.
8+
**Standardization**, also known as Z-score normalization, is a common requirement
9+
for many machine learning techniques. These techniques might perform poorly
10+
if the individual features do not more or less look like standard normally
11+
distributed data.
1112

1213
Standardization transforms data points into corresponding standard scores
13-
by removing mean and scaling to unit variance.
14+
by subtracting mean and scaling to unit variance.
1415

15-
The **standard score** is the signed number of standard deviations by which
16-
the value of an observation or data point is above the mean value of what
17-
is being observed or measured.
16+
The **standard score**, also known as Z-score, is the signed number of
17+
standard deviations by which the value of an observation or data point
18+
is above the mean value of what is being observed or measured.
1819

19-
Standardization can be performed using `fit(ZScoreTransform, ...)`.
20+
Standardization can be performed using `t = fit(ZScoreTransform, ...)`
21+
followed by `StatsBase.transform(t, ...)` or `StatsBase.transform!(t, ...)`.
22+
`standardize(ZScoreTransform, ...)` is a shorthand to perform both operations
23+
in a single call.
2024

2125
```@docs
2226
fit(::Type{ZScoreTransform}, X::AbstractArray{<:Real,2}; center::Bool=true, scale::Bool=true)
2327
```
2428

25-
## Unit range normalization
29+
## Unit Range Normalization
2630

27-
**Unit range normalization** is an alternative data transformation which scales features
28-
to lie in the interval `[0; 1]`.
31+
**Unit range normalization**, also known as min-max scaling, is an alternative
32+
data transformation which scales features to lie in the interval `[0; 1]`.
2933

30-
Unit range normalization can be performed using `fit(UnitRangeTransform, ...)`.
34+
Unit range normalization can be performed using `t = fit(UnitRangeTransform, ...)`
35+
followed by `StatsBase.transform(t, ...)` or `StatsBase.transform!(t, ...)`.
36+
`standardize(UnitRangeTransform, ...)` is a shorthand to perform both operations
37+
in a single call.
3138

3239
```@docs
3340
fit(::Type{UnitRangeTransform}, X::AbstractArray{<:Real,2}; unit::Bool=true)
3441
```
3542

36-
## Additional methods
43+
## Additional Methods
3744
```@docs
3845
StatsBase.transform
3946
StatsBase.transform!

0 commit comments

Comments
 (0)