-
Notifications
You must be signed in to change notification settings - Fork 0
🌟 Initialize documentation #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from 2 commits
f51274d
1e8a421
924df68
28c1cb9
027f620
03892b7
0263009
c0bb79d
27da502
9953cef
b92eefb
24e7e0b
cae2e9c
50e74d1
e85e221
4bbc4e7
02c8688
bc3b1df
d5af39a
4270d25
a0244a1
d11d892
14a8205
7fff9bf
2832695
72d6ab2
489dc42
6cfd374
9f9d28e
09de647
f7ac80e
8bd0dc9
2630f08
6dca39a
94f562b
c89a00d
1e78e89
3d72ff6
82c1631
0fc26c0
c283429
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
| Transformer | Brief Description | | ||
|:----------:|:----------:| | ||
| [Standardizer](@ref) | Transforming columns of numerical features by standardization | | ||
| [BoxCoxTransformer](@ref) | Transforming columns of numerical features by BoxCox transformation | | ||
| [UnivariateBoxCoxTransformer](@ref) | Apply BoxCox transformation given a single vector | | ||
| [InteractionTransformer](@ref) | Transforming columns of numerical features to create new interaction features | | ||
| [UnivariateDiscretizer](@ref) | Discretize a continuous vector into an ordered factor | | ||
| [FillImputer](@ref) | Fill missing values of features belonging to any scientific type | | ||
| [UnivariateTimeTypeToContinuous](@ref) | Transform a vector of time type into continuous type | | ||
| [OneHotEncoder](@ref) | Encode categorical variables into one-hot vectors | | ||
| [ContinuousEncoder](@ref) | Adds type casting functionality to OnehotEncoder | | ||
| [OrdinalEncoder](@ref) | Encode categorical variables into ordered integers | | ||
| [FrequencyEncoder](@ref) | Encode categorical variables into their normalized or unormalized frequencies | | ||
| [TargetEncoder](@ref) | Encode categorical variables into relevant target statistics | | ||
| [DummyEncoder](@ref) | Encodes by comparing each level to the reference level, intercept being the cell mean of the reference group | | ||
| [SumEncoder](@ref) | Encodes by comparing each level to the reference level, intercept being the grand mean | | ||
| [HelmertEncoder](@ref) | Encodes by comparing levels of a variable with the mean of the subsequent levels of the variable | ||
| [ForwardDifferenceEncoder](@ref) | Encodes by comparing adjacent levels of a variable (each level minus the next level) | ||
| [ContrastEncoder](@ref) | Allows defining a custom contrast encoder via a contrast matrix | | ||
| [HypothesisEncoder](@ref) | Allows defining a custom contrast encoder via a hypothesis matrix | | ||
ablaom marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| [EntityEmbedders](@ref) | Encode categorical variables into dense embedding vectors | | ||
EssamWisam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| [CardinalityReducer](@ref) | Reduce cardinality of high cardinality categorical features by grouping infrequent categories | | ||
| [MissingnessEncoder](@ref) | Encode missing values of categorical features into new values | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! Thank you.
ablaom marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
```@docs; canonical = false | ||
MLJTransforms.Standardizer | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.InteractionTransformer | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.BoxCoxTransformer | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.UnivariateDiscretizer | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.FillImputer | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.UnivariateTimeTypeToContinuous | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.OneHotEncoder | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.ContinuousEncoder | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.OrdinalEncoder | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.FrequencyEncoder | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.TargetEncoder | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.ContrastEncoder | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.CardinalityReducer | ||
``` | ||
|
||
```@docs; canonical = false | ||
MLJTransforms.MissingnessEncoder | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ Other Transformers include more generic transformers that go beyond categorical | |
| [UnivariateBoxCoxTransformer](@ref) | Apply BoxCox transformation given a single vector | | ||
| [InteractionTransformer](@ref) | Transforming columns of numerical features to create new interaction features | | ||
| [UnivariateDiscretizer](@ref) | Discretize a continuous vector into an ordered factor | | ||
| [FillImputer](@ref) | Fill missing values of features belonging to any finite or infinite scientific type | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So it sufficies that it can operate on infinite types. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now I do perceive that the taxonomy makes lots more sense for categorical encoder (as opposed to transformers that aren't simply encoders); especially that entity embeddings, contrast encoders and utility encoders are all nontypical encoders and deserve better exposure (aside from helping for organization). What do you think about the following, if I can do it by next Monday:
It doesn't seem like a lot of effort to me and it's intuitive in the sense that encoding packages do indeed tend to be standalone in other languages (eg, Python) as they constitute a specific type of transformers that is widely needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. | So it sufficies that it can operate on infinite types. Not sure what you are getting at. My point is your taxonomy, as I understand it, splits algorithms according to whether they operate on numerical or categorical features, no? But | What do you think about the following, if I can do it by next Monday: | Split this package into two packages MLJEncoding and MLJTransforms. The former to carry the encoder methods and the latter for the broader category of transformers? I still think we should be careful to usurp "encoder" as a word used exclusively in the context of categorial input data. Auto encoders, and variational encoders are two very important examples where the input is not necessarily categorial (typically, it's just the output that is categorical, or categorical pdf. ). Maybe we should say "categorical encoding" (and I you want to split the package, and believe this will help users, and you have the time to it quickly I don't object. I don't think there is any maintenance benefit for doing so, in fact probably more of a maintenance burden: extra code fragmentation that doesn't seem justified from a dev point of view. You could alternatively achieve the separation you are after in the way documentation is organised. For example, you could have separate doc pages both living at MLJ.jl (which is where the current But, I'll support whatever option you're happy to work out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry if I wasn't clear. I meant the definition I don't claim this is the best approach and I am open to recommendations. What do you think about adding another category There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Okay then I am no longer motivated to do that and think improving the taxonomy could be sufficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind my recommendation as numerical transformers already operate on multiple scientific types. |
||
|
||
```@docs | ||
MLJTransforms.Standardizer | ||
|
@@ -23,3 +24,7 @@ MLJTransforms.BoxCoxTransformer | |
```@docs | ||
MLJTransforms.UnivariateDiscretizer | ||
``` | ||
ablaom marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```@docs | ||
MLJTransforms.FillImputer | ||
``` |
Uh oh!
There was an error while loading. Please reload this page.