Skip to content

Commit 027f620

Browse files
committed
✨ Better structure and definitions
1 parent 28c1cb9 commit 027f620

File tree

6 files changed

+103
-19
lines changed

6 files changed

+103
-19
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ makedocs(
3131
"Contrast Encoders"=>"transformers/contrast.md",
3232
"Utility Encoders"=>"transformers/utility.md",
3333
"Other Transformers"=>"transformers/others.md",
34+
"API Index" => "transformers/all_transformers.md",
3435
],
3536
"Extended Examples" => Any[
3637
"Tutorial A" => "tutorials/T1.md",

docs/src/index.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MLJTransforms.jl
22

3-
A Julia package providing a wide range of categorical encoders and transformers to be used with the [MLJ](https://juliaai.github.io/MLJ.jl/dev/) package.
3+
A Julia package providing a wide range of categorical encoders and transformers to be used with the [MLJ](https://juliaai.github.io/MLJ.jl/dev/) package. Transformers help convert raw features into a representation that's better suited for downstream models. Meanwhile, categorical encoders are a type of transformer that specifically encodes categorical features into numerical forms.
44

55
## Installation
66

@@ -24,9 +24,11 @@ X = RDatasets.dataset("HSAUR", "Forbes2000");
2424
# 2. Load the model
2525
FrequencyEncoder = @load FrequencyEncoder pkg="MLJTransforms"
2626
encoder = FrequencyEncoder(
27-
features=[:Country, :Category],
28-
ignore=false, ordered_factor = false,
29-
normalize=true)
27+
features=[:Country, :Category], # The categorical columns to select
28+
ignore=false, # Whether to exclude or include selected columns
29+
ordered_factor = false, # Whether to also encode columns of ordered factor elements
30+
normalize=true # Whether to normalize the frequencies used for encoding
31+
)
3032

3133

3234
# 3. Wrap it in a machine and fit
@@ -35,15 +37,16 @@ Xnew = transform(mach, X)
3537
```
3638

3739
## Available Transformers
38-
In `MLJTransforms` we denote transformers that operate on columns with `Continuous` and/or `Count` [scientific types](https://juliaai.github.io/ScientificTypes.jl/dev/) as numerical transformers. Meanwhile, categorical transformers operate on `Multiclass` and/or `OrderedFactor` [scientific types](https://juliaai.github.io/ScientificTypes.jl/dev/). Most categorical transformers in this package operate by converting categorical values into numerical values or vectors, and are therefore considered categorical encoders.
40+
In `MLJTransforms` we denote transformers that can operate on columns with `Continuous` and/or `Count` [scientific types](https://juliaai.github.io/ScientificTypes.jl/dev/) as numerical transformers. Meanwhile, categorical transformers operate on `Multiclass` and/or `OrderedFactor` [scientific types](https://juliaai.github.io/ScientificTypes.jl/dev/). Most categorical transformers in this package operate by converting categorical values into numerical values or vectors, and are therefore considered categorical encoders.
3941

40-
Based on this, we categorize the methods as follows, with further distinctions for categorical encoders:
42+
Based on this, we categorize the methods in this package as follows, with further distinctions for categorical encoders:
4143

4244
| **Category** | **Description** |
4345
|:---------------------------:|:-------------------------------------------------------------------------------:|
44-
| **Numerical Transformers** | Transformers that operate on `Continuous` or `Count` columns in a given dataset.|
45-
| **Classical Encoders** | Widely recognized and frequently utilized categorical encoders. |
46-
| **Neural-based Encoders** | Categorical encoders based on neural networks. |
47-
| **Contrast Encoders** | Categorical encoders modeled via a contrast matrix. |
48-
| **Utility Encoders** | Categorical encoders meant to be used as preprocessors for other encoders or models.|
49-
| **Other Transformers** | Transformers that fall into other categories. |
46+
| [Numerical Transformers](transformers/numerical) | Transformers that operate on `Continuous` or `Count` columns in a given dataset.|
47+
| [Classical Encoders](transformers/classical.md) | Traditional categorical encoding algorithms and techniques. |
48+
| [Neural-based Encoders](transformers/neural) | Categorical encoders based on neural networks. |
49+
| [Contrast Encoders](transformers/contrast.md) | Categorical encoders that could be modeled via a contrast matrix. |
50+
| [Utility Encoders](transformers/utility.md) | Categorical encoders meant to be used as preprocessors for other transformers or models.|
51+
| [Other Transformers](transformers/others.md) | Transformers that operate on scientific types that are neither `Finite` nor `Infinite` |
52+
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
| Transformer | Brief Description |
2+
|:----------:|:----------:|
3+
| [Standardizer](@ref) | Transforming columns of numerical features by standardization |
4+
| [BoxCoxTransformer](@ref) | Transforming columns of numerical features by BoxCox transformation |
5+
| [UnivariateBoxCoxTransformer](@ref) | Apply BoxCox transformation given a single vector |
6+
| [InteractionTransformer](@ref) | Transforming columns of numerical features to create new interaction features |
7+
| [UnivariateDiscretizer](@ref) | Discretize a continuous vector into an ordered factor |
8+
| [FillImputer](@ref) | Fill missing values of features belonging to any scientific type |
9+
| [UnivariateTimeTypeToContinuous](@ref) | Transform a vector of time type into continuous type |
10+
| [OneHotEncoder](@ref) | Encode categorical variables into one-hot vectors |
11+
| [ContinuousEncoder](@ref) | Adds type casting functionality to OnehotEncoder |
12+
| [OrdinalEncoder](@ref) | Encode categorical variables into ordered integers |
13+
| [FrequencyEncoder](@ref) | Encode categorical variables into their normalized or unormalized frequencies |
14+
| [TargetEncoder](@ref) | Encode categorical variables into relevant target statistics |
15+
| [DummyEncoder](@ref) | Encodes by comparing each level to the reference level, intercept being the cell mean of the reference group |
16+
| [SumEncoder](@ref) | Encodes by comparing each level to the reference level, intercept being the grand mean |
17+
| [HelmertEncoder](@ref) | Encodes by comparing levels of a variable with the mean of the subsequent levels of the variable
18+
| [ForwardDifferenceEncoder](@ref) | Encodes by comparing adjacent levels of a variable (each level minus the next level)
19+
| [ContrastEncoder](@ref) | Allows defining a custom contrast encoder via a contrast matrix |
20+
| [HypothesisEncoder](@ref) | Allows defining a custom contrast encoder via a hypothesis matrix |
21+
| [EntityEmbedders](@ref) | Encode categorical variables into dense embedding vectors |
22+
| [CardinalityReducer](@ref) | Reduce cardinality of high cardinality categorical features by grouping infrequent categories |
23+
| [MissingnessEncoder](@ref) | Encode missing values of categorical features into new values |
24+
25+
26+
```@docs; canonical = false
27+
MLJTransforms.Standardizer
28+
```
29+
30+
```@docs; canonical = false
31+
MLJTransforms.InteractionTransformer
32+
```
33+
34+
```@docs; canonical = false
35+
MLJTransforms.BoxCoxTransformer
36+
```
37+
38+
```@docs; canonical = false
39+
MLJTransforms.UnivariateDiscretizer
40+
```
41+
42+
```@docs; canonical = false
43+
MLJTransforms.FillImputer
44+
```
45+
46+
```@docs; canonical = false
47+
MLJTransforms.UnivariateTimeTypeToContinuous
48+
```
49+
50+
```@docs; canonical = false
51+
MLJTransforms.OneHotEncoder
52+
```
53+
54+
```@docs; canonical = false
55+
MLJTransforms.ContinuousEncoder
56+
```
57+
58+
```@docs; canonical = false
59+
MLJTransforms.OrdinalEncoder
60+
```
61+
62+
```@docs; canonical = false
63+
MLJTransforms.FrequencyEncoder
64+
```
65+
66+
```@docs; canonical = false
67+
MLJTransforms.TargetEncoder
68+
```
69+
70+
```@docs; canonical = false
71+
MLJTransforms.ContrastEncoder
72+
```
73+
74+
```@docs; canonical = false
75+
MLJTransforms.CardinalityReducer
76+
```
77+
78+
```@docs; canonical = false
79+
MLJTransforms.MissingnessEncoder
80+
```

docs/src/transformers/neural.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Neural-based Encoders include categorical encoders based on neural networks:
22

33
| Transformer | Brief Description |
44
|:----------:|:----------:|
5-
| [EntityEmbedders](@ref) | Encode categorical variables into dense embedding vectors |
5+
| [EntityEmbedder](@ref) | Encode categorical variables into dense embedding vectors |
66

77

88
Entity Embedder docstring will go here.

docs/src/transformers/numerical.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Other Transformers include more generic transformers that go beyond categorical
77
| [UnivariateBoxCoxTransformer](@ref) | Apply BoxCox transformation given a single vector |
88
| [InteractionTransformer](@ref) | Transforming columns of numerical features to create new interaction features |
99
| [UnivariateDiscretizer](@ref) | Discretize a continuous vector into an ordered factor |
10+
| [FillImputer](@ref) | Fill missing values of features belonging to any finite or infinite scientific type |
1011

1112
```@docs
1213
MLJTransforms.Standardizer
@@ -23,3 +24,7 @@ MLJTransforms.BoxCoxTransformer
2324
```@docs
2425
MLJTransforms.UnivariateDiscretizer
2526
```
27+
28+
```@docs
29+
MLJTransforms.FillImputer
30+
```

docs/src/transformers/others.md

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,9 @@
1-
Transformers that operate on columns with general or specialized scientific types.
1+
ransformers that operate on scientific types that are neither `Finite` nor `Infinite`.
22

33
| Transformer | Brief Description |
44
|:----------:|:----------:|
5-
| [FillImputer](@ref) | Fill missing values of features belonging to any scientific type |
65
| [UnivariateTimeTypeToContinuous](@ref) | Transform a vector of time type into continuous type |
76

8-
```@docs
9-
MLJTransforms.FillImputer
10-
```
11-
127

138
```@docs
149
MLJTransforms.UnivariateTimeTypeToContinuous

0 commit comments

Comments
 (0)