Skip to content

Partial code reviewΒ #15

@ablaom

Description

@ablaom

@aa25desh I've taken a look at the overall structure of the code and have some comments. I can see a lot of work has gone into this, particularly into core algorithms (which I have not, however, reviewed in any detail). Be great if you can look at this when you have some time.

cc @vollmersj @mloning

  • Please change the name of this repository to TimeSeriesClassification.jl,
    or something similar without MLJ prefix, which we are now
    reserving for packages providing core functionality.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/test/runtests.jl#L10

  • Where does the right hand side for this test come from? If this is
    the output of some alternative implementation (e.g., sk-learn), then
    please state this clearly in a comment. Otherwise, explain why you
    know the right hand side must be the correct output (Independent of
    your implementation). In any case, the test is not robust because you are
    comparing floats using ==. Instead, please use approx or β‰ˆ.

  • I think a lack of unit tests here is still a serious issue. What
    about a unit test for these functions?

    • _discrete_fourier_transform,
    • transform (at this
      line
    • _shorten_bags,
    • select_sort
    • InvFeaturesGen (version on this line
    • apply_kernel
    • apply_kernels

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/Project.toml#L6

  • You should not have MLJBase as a dependency unless you
    absolutely need it. I see that you use accuracy. I'm guessing you only need it for a
    test? The idea is that the light-weight package MLJModelInterface
    should suffice. You will still need MLJBase as a dependency for
    testing, i.e., listed under [extras] and [targets]. Read
    this
    carefully. If you're not sure how to include dependencies for
    testing, look at the examples at MLJModels or elsewhere.

  • Perhaps review the inclusion of MultivariateStats and Distributions
    as dependencies, as these are pretty hefty. Note that you can use
    the UnivariateFinite constructor without Distributions or
    MLJBase.

  • Before registering your package, you will need to give every package
    in [deps] that is not part of the standard library and explicit
    [compa] entry. Then you can accelerate merging of patch and
    minor releases.

https://github.com/alan-turing-institute/MLJTime.jl

  • At a minimum, documentation needs to include a description of each
    model provided (Could be a table), ideally including an explanation
    of all hyper parameters. This could be as simple as a reproduction
    of the doc strings. I would put this directly in the readme. It is
    difficult to find this information quickly from the other
    "documentation" that you provide. If input data is matrix rather
    than tabular, say whether your observations correspond to
    rows or columns. Probably worth stating
    explicitly what input is allowed, since a lot of MLJ models use
    tabular data.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/MLJTime.jl#L12

  • You shouldn't need to export the methods predict, predict_mean,
    fitted_params, predict_mode, as you are overloading methods
    already defined in MLJModelInterface which are already exported by
    MLJ or MLJBase. The methods accuracy, fit!, and machine are
    also already exported by MLJ/MLJBase. At the moment, the user's
    work-flow would begin using MLJ; using MLJTime; .... After your package is
    registered, the work-flow will be using MLJ; @load TimeSeriesForestClassifier ... or similar.

  • predict_new looks like a private method; it is not part of the MLJ
    API; I don't think you need to export it.

  • ditto RandomForestClassifierFit.

  • I suggest exporting your model types here (for example,
    TimeSeriesForestClassifier). (Does the example on the read me
    actually work without this export?)

  • We need model metadata here. Here's a suggestion for the first
    classifier:

MMI.input_scitype(::Type{<:TimeSeriesForestClassifier}) = 
    AbstractMatrix(<:MMI.Continuous)
MMI.target_scitype(::Type{<:TimeSeriesForestClassifier}) = 
    AbstractVector{<:MMI.Finite}
MMI.load_path(::Type{<:TimeSeriesForestClassifier}) = 
    "TimeSeriesClassification.TimeSeriesForestClassifier"
MMI.package_name(::Type{<:TimeSeriesForestClassifier}) = 
    "TimeSeriesClassification"
MMI.package_uuid(::Type{<:TimeSeriesForestClassifier}) = 
    "2a643d68-9e49-4566-a2d5-26c3fb6c4a71"
MMI.package_url(::Type{<:TimeSeriesForestClassifier}) = "???"
MMI.is_pure_julia(::Type{<:TimeSeriesForestClassifier}) = true

I'm assuming here that your inputs are matrices of abstract floats. If
you change your requirements for the input type, then you'll need to
modify the input_trait accordingly.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/interface.jl#L93

  • Add a signature for the constructor here. Just like you have for
    the preceding model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions