|
7 | 7 | [](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml) |
8 | 8 | <!-- TODO: codecov --> |
9 | 9 |
|
10 | | -Extremely fast **📈 time series downsampling** for visualization, written in Rust. |
| 10 | +Extremely fast **time series downsampling 📈** for visualization, written in Rust. |
11 | 11 |
|
12 | 12 | ## Features ✨ |
13 | 13 |
|
14 | 14 | * **Fast**: written in rust with PyO3 bindings |
15 | 15 | - leverages optimized [argminmax](https://github.com/jvdd/argminmax) - which is SIMD accelerated with runtime feature detection |
16 | 16 | - scales linearly with the number of data points |
| 17 | + <!-- TODO check if it scales sublinearly --> |
17 | 18 | - multithreaded with Rayon (in Rust) |
18 | 19 | <details> |
19 | 20 | <summary><i>Why we do not use Python multiprocessing</i></summary> |
@@ -62,14 +63,51 @@ s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000) |
62 | 63 | s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000) |
63 | 64 | ``` |
64 | 65 |
|
65 | | -## Limitations |
| 66 | +## Downsampling algorithms & API |
| 67 | + |
| 68 | +### Downsampling API 📑 |
| 69 | + |
| 70 | +Each downsampling algorithm is implemented as a class that implements a `downsample` method. |
| 71 | +The signature of the `downsample` method: |
| 72 | + |
| 73 | +``` |
| 74 | +downsample([x], y, n_out, **kwargs) -> ndarray[uint64] |
| 75 | +``` |
| 76 | + |
| 77 | +**Arguments**: |
| 78 | +- `x` is optional |
| 79 | +- `x` and `y` are both positional arguments |
| 80 | +- `n_out` is a mandatory keyword argument that defines the number of output values<sup>*</sup> |
| 81 | +- `**kwargs` are optional keyword arguments *(see [table below](#downsampling-algorithms-📈))*: |
| 82 | + - `parallel`: whether to use multi-threading (default: `False`)<sup>**</sup> |
| 83 | + - ... |
| 84 | + |
| 85 | +**Returns**: a `ndarray[uint64]` of indices that can be used to index the original data. |
| 86 | + |
| 87 | +<sup>*</sup><i>When there are gaps in the time series, fewer than `n_out` indices may be returned.</i> |
| 88 | +<sup>**</sup><i>`parallel` is not supported for `LTTBDownsampler`.</i> |
| 89 | +### Downsampling algorithms 📈 |
| 90 | + |
| 91 | +The following downsampling algorithms (classes) are implemented: |
| 92 | + |
| 93 | +| Downsampler | Description | `**kwargs` | |
| 94 | +| ---:| --- |--- | |
| 95 | +| `MinMaxDownsampler` | selects the **min and max** value in each bin | `parallel` | |
| 96 | +| `M4Downsampler` | selects the [**min, max, first and last**](https://dl.acm.org/doi/pdf/10.14778/2732951.2732953) value in each bin | `parallel` | |
| 97 | +| `LTTBDownsampler` | performs the [**Largest Triangle Three Buckets**](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm | |
| 98 | +| `MinMaxLTTBDownsampler` | (*new two-step algorithm 🎉*) first selects `n_out` * `minmax_ratio` **min and max** values, then further reduces these to `n_out` values using the **Largest Triangle Three Buckets** algorithm | `parallel`, `minmax_ratio`<sup>*</sup> | |
| 99 | + |
| 100 | +<sup>*</sup><i>Default value for `minmax_ratio` is 30, which is empirically proven to be a good default. (More details in our upcomming paper)</i> |
| 101 | + |
| 102 | + |
| 103 | +## Limitations & assumptions 🚨 |
66 | 104 |
|
67 | 105 | Assumes; |
68 | | -(i) x-data monotinically increasing (i.e., sorted) |
69 | | -(ii) no NaNs in the data |
| 106 | +1. `x`-data is (non-strictly) monotonic increasing (i.e., sorted) |
| 107 | +2. no `NaNs` in the data |
70 | 108 |
|
71 | 109 | --- |
72 | 110 |
|
73 | 111 | <p align="center"> |
74 | 112 | 👤 <i>Jeroen Van Der Donckt</i> |
75 | | -</p> |
| 113 | +</p> |
0 commit comments