-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
This issue is to discuss what functions should be ported from StatsBase to Statistics (#2). Some functions would better move to a separate package:
- statmodels.jl: should go to StatsAPI.jl
Most APIs have passed the test of time so they are probably good enough, but I find some of them are not completely satisfying:
- hist.jl: I don't know this part of the code enough to judge whether the API is OK. There have been proposals to move these to a separate package (Proposal: Move histograms to separate package StatsBase.jl#650).
- weights.jl: Weighted
sumcannot be implemented via aweightskeyword arguments like other functions since the function lives in Base (RFC: Add weights argument to sum JuliaLang/julia#33310). We could either exportwsumor keep it internal and do not support it for now. - counts.jl:
countssounds a bit too generic of a term for a function that only allows counting integer values.countmapis more general and its name is explicit. That said,countscould easily be extended to allow any type of levels -- its limitation is just that it returns a vector without names so the mapping to the levels has to be done by hand, which isn't user-friendly. APIs provided by FreqTables.jl are nicer to use, but they need NamedArrays.jl (or a similar package). Then there's the issue thatcountmapuses radix sort for performance with some types, but this needs SortingAlgorithms.jl, which isn't a stdlib (yet?). - deviation.jl: Do we really need all of these small convenience functions?
counteqandcountnedon't really sound like statistical functions and I'm not sure how commonly they are used.sqL2dist,L2dist,L1dist,Linfdisthave an uppercase in their name; these and remaining functions are redundant with functions provided in Distances.jl. That only leavespsnr. - misc.jl:
indexmapis justindexinso remove it.levelsmapandindicatormatsound a bit limited compared with what StatsModels provides.rleandinverse_rleare not really related to statistics. - scalarstats.jl:
mean_and_varandmean_and_stdhave weird names so I'm not sure we should keep them or not.zscoreandzscore!are convenient but redundant with (more general and more verbose) functions in transformations.jl. - transformations.jl:
transformandtransform!are too generic names, I propose overloading LinearAlgebra'snormalizeandnormalize!, since that name is actually the commonly used term for such transformations. I wonder whether we really needreconstructandreconstruct!(which could be calledunnormalizeif we keep them). I'm also not sure what's the use of allowing a separatefitoperation before actually applying the transformation (I'd imagine one would always normalize the data immediately). - moments.jl:
momentis redundant with specific functions so I'd drop it. - robust.jl:
trimvar(x)could bevar(trim(x))iftrim(x)returned a special iterator type to dispatch on
See also my previous notes at JuliaLang/julia#27152 (comment).
Metadata
Metadata
Assignees
Labels
No labels