Skip to content

Commit 957aeb1

Browse files
authored
Merge pull request #52 from JuliaAI/dev
For a 0.5.0 release
2 parents 4bfce85 + 21ee57d commit 957aeb1

File tree

6 files changed

+560
-388
lines changed

6 files changed

+560
-388
lines changed

Project.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
11
name = "MLJMultivariateStatsInterface"
22
uuid = "1b6a4a23-ba22-4f51-9698-8599985d3728"
33
authors = ["Anthony D. Blaom <[email protected]>", "Thibaut Lienart <[email protected]>", "Okon Samuel <[email protected]>"]
4-
version = "0.4.0"
4+
version = "0.5.0"
55

66
[deps]
7+
CategoricalDistributions = "af321ab8-2d2e-40a6-b165-3d674595d28e"
78
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
89
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
910
MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
1011
MultivariateStats = "6f286f6a-111f-5878-ab1e-185364afe411"
1112
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
1213

1314
[compat]
15+
CategoricalDistributions = "0.1.9"
1416
Distances = "0.9,0.10"
1517
MLJModelInterface = "1.4"
16-
MultivariateStats = "0.9"
18+
MultivariateStats = "0.10"
1719
StatsBase = "0.32, 0.33"
1820
julia = "1.6"
1921

src/MLJMultivariateStatsInterface.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import MultivariateStats
99
import MultivariateStats: SimpleCovariance
1010
import StatsBase: CovarianceEstimator
1111

12+
using CategoricalDistributions
1213
using Distances
1314
using LinearAlgebra
1415

src/models/decomposition_models.jl

Lines changed: 44 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -284,8 +284,8 @@ MMI.fitted_params(::ICA, fr) = (projection=copy(fr.W), mean = copy(MS.mean(fr)))
284284
285285
$(MMI.doc_header(PCA))
286286
287-
Principal component analysis learns a linear projection onto a lower dimensional space while
288-
preserving most of the initial variance seen in the training data.
287+
Principal component analysis learns a linear projection onto a lower dimensional space
288+
while preserving most of the initial variance seen in the training data.
289289
290290
# Training data
291291
@@ -303,11 +303,11 @@ Train the machine using `fit!(mach, rows=...)`.
303303
# Hyper-parameters
304304
305305
- `maxoutdim=0`: Together with `variance_ratio`, controls the output dimension `outdim`
306-
chosen by the model. Specifically, suppose that `k` is the smallest integer such that
307-
retaining the `k` most significant principal components accounts for `variance_ratio` of
308-
the total variance in the training data. Then `outdim = min(outdim, maxoutdim)`. If
309-
`maxoutdim=0` (default) then the effective `maxoutdim` is `min(n, indim - 1)` where `n`
310-
is the number of observations and `indim` the number of features in the training data.
306+
chosen by the model. Specifically, suppose that `k` is the smallest integer such that
307+
retaining the `k` most significant principal components accounts for `variance_ratio` of
308+
the total variance in the training data. Then `outdim = min(outdim, maxoutdim)`. If
309+
`maxoutdim=0` (default) then the effective `maxoutdim` is `min(n, indim - 1)` where `n`
310+
is the number of observations and `indim` the number of features in the training data.
311311
312312
- `variance_ratio::Float64=0.99`: The ratio of variance preserved after the transformation
313313
@@ -348,7 +348,8 @@ The fields of `fitted_params(mach)` are:
348348
349349
The fields of `report(mach)` are:
350350
351-
- `indim`: Dimension (number of columns) of the training data and new data to be transformed.
351+
- `indim`: Dimension (number of columns) of the training data and new data to be
352+
transformed.
352353
353354
- `outdim = min(n, indim, maxoutdim)` is the output dimension; here `n` is the number of
354355
observations.
@@ -428,14 +429,15 @@ Train the machine using `fit!(mach, rows=...)`.
428429
# Operations
429430
430431
- `transform(mach, Xnew)`: Return a lower dimensional projection of the input `Xnew`, which
431-
should have the same scitype as `X` above.
432+
should have the same scitype as `X` above.
432433
433434
- `inverse_transform(mach, Xsmall)`: For a dimension-reduced table `Xsmall`, such as
434435
returned by `transform`, reconstruct a table, having same the number of columns as the
435436
original training data `X`, that transforms to `Xsmall`. Mathematically,
436437
`inverse_transform` is a right-inverse for the PCA projection map, whose image is
437-
orthogonal to the kernel of that map. In particular, if `Xsmall = transform(mach, Xnew)`,
438-
then `inverse_transform(Xsmall)` is only an approximation to `Xnew`.
438+
orthogonal to the kernel of that map. In particular, if
439+
`Xsmall = transform(mach, Xnew)`, then `inverse_transform(Xsmall)` is only an
440+
approximation to `Xnew`.
439441
440442
# Fitted parameters
441443
@@ -502,9 +504,11 @@ Train the machine using `fit!(mach, rows=...)`.
502504
503505
# Hyper-parameters
504506
505-
- `outdim::Int=0`: The number of independent components to recover, set automatically if `0`.
507+
- `outdim::Int=0`: The number of independent components to recover, set automatically
508+
if `0`.
506509
507-
- `alg::Symbol=:fastica`: The algorithm to use (only `:fastica` is supported at the moment).
510+
- `alg::Symbol=:fastica`: The algorithm to use (only `:fastica` is supported at the
511+
moment).
508512
509513
- `fun::Symbol=:tanh`: The approximate neg-entropy function, one of `:tanh`, `:gaus`.
510514
@@ -515,18 +519,18 @@ Train the machine using `fit!(mach, rows=...)`.
515519
- `tol::Real=1e-6`: The convergence tolerance for change in the unmixing matrix W.
516520
517521
- `mean::Union{Nothing, Real, Vector{Float64}}=nothing`: mean to use, if nothing (default)
518-
centering is computed and applied, if zero, no centering; otherwise a vector of means can
519-
be passed.
522+
centering is computed and applied, if zero, no centering; otherwise a vector of means
523+
can be passed.
520524
521-
- `winit::Union{Nothing,Matrix{<:Real}}=nothing`: Initial guess for the unmixing matrix `W`:
522-
either an empty matrix (for random initialization of `W`), a matrix of size `m × k` (if
523-
`do_whiten` is true), or a matrix of size `m × k`. Here `m` is the number of components
524-
(columns) of the input.
525+
- `winit::Union{Nothing,Matrix{<:Real}}=nothing`: Initial guess for the unmixing matrix
526+
`W`: either an empty matrix (for random initialization of `W`), a matrix of size
527+
`m × k` (if `do_whiten` is true), or a matrix of size `m × k`. Here `m` is the number
528+
of components (columns) of the input.
525529
526530
# Operations
527531
528-
- `transform(mach, Xnew)`: Return the component-separated version of input
529-
`Xnew`, which should have the same scitype as `X` above.
532+
- `transform(mach, Xnew)`: Return the component-separated version of input `Xnew`, which
533+
should have the same scitype as `X` above.
530534
531535
# Fitted parameters
532536
@@ -540,7 +544,8 @@ The fields of `fitted_params(mach)` are:
540544
541545
The fields of `report(mach)` are:
542546
543-
- `indim`: Dimension (number of columns) of the training data and new data to be transformed.
547+
- `indim`: Dimension (number of columns) of the training data and new data to be
548+
transformed.
544549
545550
- `outdim`: Dimension of transformed data.
546551
@@ -593,9 +598,9 @@ ICA
593598
$(MMI.doc_header(FactorAnalysis))
594599
595600
Factor analysis is a linear-Gaussian latent variable model that is closely related to
596-
probabilistic PCA. In contrast to the probabilistic PCA model, the covariance of conditional
597-
distribution of the observed variable given the latent variable is diagonal rather than
598-
isotropic.
601+
probabilistic PCA. In contrast to the probabilistic PCA model, the covariance of
602+
conditional distribution of the observed variable given the latent variable is diagonal
603+
rather than isotropic.
599604
600605
# Training data
601606
@@ -606,7 +611,7 @@ In MLJ or MLJBase, bind an instance `model` to data with
606611
Here:
607612
608613
- `X` is any table of input features (eg, a `DataFrame`) whose columns
609-
are of scitype `Continuous`; check column scitypes with `schema(X)`.
614+
are of scitype `Continuous`; check column scitypes with `schema(X)`.
610615
611616
Train the machine using `fit!(mach, rows=...)`.
612617
@@ -615,8 +620,8 @@ Train the machine using `fit!(mach, rows=...)`.
615620
- `method::Symbol=:cm`: Method to use to solve the problem, one of `:ml`, `:em`, `:bayes`.
616621
617622
- `maxoutdim=0`: Controls the the dimension (number of columns) of the output,
618-
`outdim`. Specifically, `outdim = min(n, indim, maxoutdim)`, where `n` is the number of
619-
observations and `indim` the input dimension.
623+
`outdim`. Specifically, `outdim = min(n, indim, maxoutdim)`, where `n` is the number of
624+
observations and `indim` the input dimension.
620625
621626
- `maxiter::Int=1000`: Maximum number of iterations.
622627
@@ -625,8 +630,8 @@ Train the machine using `fit!(mach, rows=...)`.
625630
- `eta::Real=tol`: Variance lower bound.
626631
627632
- `mean::Union{Nothing, Real, Vector{Float64}}=nothing`: If `nothing`, centering will be
628-
computed and applied; if set to `0` no centering is applied (data is assumed
629-
pre-centered); if a vector, the centering is done with that vector.
633+
computed and applied; if set to `0` no centering is applied (data is assumed
634+
pre-centered); if a vector, the centering is done with that vector.
630635
631636
# Operations
632637
@@ -653,7 +658,8 @@ The fields of `fitted_params(mach)` are:
653658
654659
The fields of `report(mach)` are:
655660
656-
- `indim`: Dimension (number of columns) of the training data and new data to be transformed.
661+
- `indim`: Dimension (number of columns) of the training data and new data to be
662+
transformed.
657663
658664
- `outdim`: Dimension of transformed data (number of factors).
659665
@@ -737,8 +743,8 @@ Train the machine using `fit!(mach, rows=...)`.
737743
of columns as the original training data `X`, that transforms to `Xsmall`.
738744
Mathematically, `inverse_transform` is a right-inverse for the PCA projection
739745
map, whose image is orthogonal to the kernel of that map. In particular, if
740-
`Xsmall = transform(mach, Xnew)`, then `inverse_transform(Xsmall)` is
741-
only an approximation to `Xnew`.
746+
`Xsmall = transform(mach, Xnew)`, then `inverse_transform(Xsmall)` is only an
747+
approximation to `Xnew`.
742748
743749
# Fitted parameters
744750
@@ -752,13 +758,15 @@ The fields of `fitted_params(mach)` are:
752758
753759
The fields of `report(mach)` are:
754760
755-
- `indim`: Dimension (number of columns) of the training data and new data to be transformed.
761+
- `indim`: Dimension (number of columns) of the training data and new data to be
762+
transformed.
763+
756764
- `outdim`: Dimension of transformed data.
757765
758766
- `tvat`: The variance of the components.
759767
760-
- `loadings`: The models loadings, weights for each variable used when calculating principal
761-
components.
768+
- `loadings`: The models loadings, weights for each variable used when calculating
769+
principal components.
762770
763771
# Examples
764772

0 commit comments

Comments
 (0)