Partial code review

@aa25desh I've taken a look at the overall structure of the code and have some comments. I can see a lot of work has gone into this, particularly into core algorithms (which I have not, however, reviewed in any detail). Be great if you can look at this when you have some time.

cc @vollmersj @mloning 

- [x] Please change the name of this repository to TimeSeriesClassification.jl,
  or something similar without `MLJ` prefix, which we are now
  reserving for packages providing core functionality.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/test/runtests.jl#L10

- [ ] Where does the right hand side for this test come from? If this is
  the output of some alternative implementation (e.g., sk-learn), then
  please state this clearly in a comment. Otherwise, explain why you
  know the right hand side must be the correct output (Independent of
  your implementation).  In any case, the test is not robust because you are
  comparing floats using `==`. Instead, please use `approx` or `≈`.
  
- [ ] I think a lack of unit tests here is still a serious issue. What
  about a unit test for these functions?
  
  - [ ] `_discrete_fourier_transform`, 
  - [ ] `transform` (at [this
  line](https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/dictionary_based_classifier.jl#L111)
  - [ ] `_shorten_bags`, 
  - [x] `select_sort`
  - [x] `InvFeaturesGen` (version on [this line](https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/interval_based_forest.jl#L83)
  - [x] `apply_kernel`
  - [x] `apply_kernels`
  
https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/Project.toml#L6

- [ ] You should not have MLJBase as a dependency unless you
  absolutely need it. I see that you use `accuracy`.  I'm guessing you only need it for a
  test? The idea is that the light-weight package MLJModelInterface
  should suffice. You will still need MLJBase as a dependency for
  testing, i.e., listed under [extras] and [targets]. Read
  [this](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Important-1)
  carefully. If you're not sure how to include dependencies for
  testing, look at the examples at MLJModels or elsewhere. 
  
- [ ] Perhaps review the inclusion of MultivariateStats and Distributions
  as dependencies, as these are pretty hefty. Note that you can use
  the `UnivariateFinite` constructor without `Distributions` or
  `MLJBase`.
  
- [ ] Before registering your package, you will need to give every package
  in [deps] that is not part of the standard library and explicit
  [compa] entry. Then you can accelerate merging of patch and
  minor releases.
  
https://github.com/alan-turing-institute/MLJTime.jl

- [ ] At a minimum, documentation needs to include a description of each
  model provided (Could be a table), ideally including an explanation
  of all hyper parameters. This could be as simple as a reproduction
  of the doc strings. I would put this directly in the readme. It is
  difficult to find this information quickly from the other
  "documentation" that you provide. If input data is matrix rather
  than tabular, say whether your observations correspond to
  rows or columns. Probably worth stating
  explicitly what input is allowed, since a lot of MLJ models use
  tabular data.
  
https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/MLJTime.jl#L12

- [ ] You shouldn't need to export the methods `predict`, `predict_mean`,
  `fitted_params`, `predict_mode`, as you are overloading methods
  already defined in MLJModelInterface which are already exported by
  MLJ or MLJBase. The methods `accuracy`, `fit!`, and `machine` are
  also already exported by MLJ/MLJBase. At the moment, the user's
  work-flow would begin `using MLJ; using MLJTime; ...`. After your package is
  registered, the work-flow will be `using MLJ; @load
  TimeSeriesForestClassifier ...` or similar.
  
- [ ] `predict_new` looks like a private method; it is not part of the MLJ
  API; I don't think you need to export it.

- [ ] ditto `RandomForestClassifierFit`.

- [ ] I suggest exporting your model types here (for example,
  `TimeSeriesForestClassifier`). (Does the example on the read me
  actually work without this export?)
  
- [ ] We need model metadata here. Here's a suggestion for the first
  classifier:
  
```julia 
MMI.input_scitype(::Type{<:TimeSeriesForestClassifier}) = 
    AbstractMatrix(<:MMI.Continuous)
MMI.target_scitype(::Type{<:TimeSeriesForestClassifier}) = 
    AbstractVector{<:MMI.Finite}
MMI.load_path(::Type{<:TimeSeriesForestClassifier}) = 
    "TimeSeriesClassification.TimeSeriesForestClassifier"
MMI.package_name(::Type{<:TimeSeriesForestClassifier}) = 
    "TimeSeriesClassification"
MMI.package_uuid(::Type{<:TimeSeriesForestClassifier}) = 
    "2a643d68-9e49-4566-a2d5-26c3fb6c4a71"
MMI.package_url(::Type{<:TimeSeriesForestClassifier}) = "???"
MMI.is_pure_julia(::Type{<:TimeSeriesForestClassifier}) = true

```

I'm assuming here that your inputs are matrices of abstract floats. If
you change your requirements for the input type, then you'll need to
modify the `input_trait` accordingly.
  
https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/interface.jl#L93

- [ ] Add a signature for the constructor here. Just like you have for
  the preceding model.
  

  









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partial code review #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Partial code review #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions