Skip to content

Time series classification API for MLJ

aa25desh edited this page May 26, 2020 · 9 revisions

Contributors: @aa25desh, @mloning

Introduction

Classification

Time series(TS) classification deals with classifying the data points over time based on its’ behavior. A TS is represented as a feature vector and a set of feature vectors are used with a classification model such as a decision tree for classification. i.e We observe N (i.i.d.) panel data training instances of feature-label pairs (xi,yi), i = 1...N. Each instance of features is a time series xi = (xi(t1)...xi(tT)). The task is to use the training data to learn a predictor f that can accurately predict a new target value, such that yˆ = f(x∗) for a new input time series x∗.

Ex1

  • The GunPoint problem contains univariate tracking data of actor's wrists as they either take a prop gun from a holster, point it, and return it to the holster, or they simply mime the action without the prop. The classification problem is to detect whether a motion trace is a case of a gun or no gun. Section of the TS where an actor returns the gun to the holster; Is discriminatory because if the gun prop is present then the actor's hand stops suddenly, but if the prop is not present, the actor's hand will not stop abruptly and will subtly continue past their waist due to inertia.

Ex2

  • The series Cricket X, Y and Z are accelerometer data (in three dimensions) taken from actors performing cricket gestures. The five classes are different umpire signals: Dead Ball, Four, Last Hour Leg Bye, Out, Penalty Runs. Two accelerometers orthogonal to each other, thus acceleration is measured in 3D space. The accelerometers are housed in a small wristwatch sized enclosure worn in the form of a wrist band. Taken together the data is a multivariate time series classification problem.

We are taking MLJ's approach to have a unified interface for TS.

Table of Contents

Key concepts and questions

Questions

  • What is the hardware we gonna run our code on?
  • Should we have lite interface down the line?
  • Approach to ensembles via multithreading or distributed networks?
  • Under what conditions one needs to copy the data container?
  • Can we directly extend existing Pkgs in the MLJ echo system?
  • Ability to train specific parts of a pre-trained model?
  • Should we support a sparse matrix?

Data container

Requirements

  • A single time series as well as multiple i.i.d. instances of time series.
  • Time heterogeneous data where time indices vary across instances and/or variables.
  • keep track of the index of series.
  • handle type-heterogeneous data.

Different options

  • TimeSeries

    • Pros
      • Support for NaN value.
      • Indexing follows common semantics & Checks the sequence of the data.
      • Common transformation of time series data involves lagging, leading, calculating change, windowing operations, and aggregation operations.
    • Cons
      • TimeArray struct is immutable so in-place operations are not possible.
      • Supports equal length of data only.
      • Save and load operations are not supported properly.
      • The type of value must be the same.
  • JuliaDB

    • Pros
      • Ability to perform a basic operation over extended types eg. Join, Select, Reshape.
      • Saving results and operation over data, support for missing & NaN values.
      • Interface online datasets. eg. MySQL
    • Cons
      • JuliaDB pkg is quite large.

Model interface

  • fit
    • fit(model, verbosity, X, y)
  • predict
    • predict(model, fitresult, X)
  • update
    • update(model, verbosity, fitresult, cache, X, y)
  • The fitresult stores the learned parameters, & purpose of cache is to pass on "state" not included in the fitresult to an update method

Machine interface

  • Machine
    • mach = machine(model, X, y)
  • fitresult, cache and report variables generated by lower-level calls to fit or update

Wrapping sktime (ScikitLearn style)

  • api_map should be Dict of APIs form Python to Julia.
  • BaseEstimator as the AbstractType for all Estimator in MLJTime.jl
  • set_params! function will pass the parameters to all Estimators.
  • @skt_import micro will import modules from sktime.
  • IO stream will be passed as a text-IO object.

Re-implementing algorithms