🖊️ improve readme documentation (#22)

jvdd · jonasvdd · web-flow · commit b6291cbaf835 · 2023-01-23T17:32:42.000+01:00
* 🖊️ first draft of updated readme

* ♻️ change to table

* 🧹

* 🖊️ review

* 🧹

Co-authored-by: Jonas Van Der Donckt &lt;38005924+jonasvdd@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -7,13 +7,14 @@
 [![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml)
 <!-- TODO: codecov -->
 
-Extremely fast **📈 time series downsampling** for visualization, written in Rust.
+Extremely fast **time series downsampling 📈** for visualization, written in Rust.
 
 ## Features ✨
 
 * **Fast**: written in rust with PyO3 bindings  
   - leverages optimized [argminmax](https://github.com/jvdd/argminmax) - which is SIMD accelerated with runtime feature detection
   - scales linearly with the number of data points
+  <!-- TODO check if it scales sublinearly -->
   - multithreaded with Rayon (in Rust)
     <details>
       <summary><i>Why we do not use Python multiprocessing</i></summary>
@@ -62,14 +63,51 @@ s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)
 s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)
 ```
 
-## Limitations
+## Downsampling algorithms & API 
+
+### Downsampling API 📑
+
+Each downsampling algorithm is implemented as a class that implements a `downsample` method.  
+The signature of the `downsample` method:
+
+```
+downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
+```
+
+**Arguments**:
+- `x` is optional
+- `x` and `y` are both positional arguments
+- `n_out` is a mandatory keyword argument that defines the number of output values<sup>*</sup>
+- `**kwargs` are optional keyword arguments *(see [table below](#downsampling-algorithms-📈))*:
+  - `parallel`: whether to use multi-threading (default: `False`)<sup>**</sup>
+  - ...
+
+**Returns**: a `ndarray[uint64]` of indices that can be used to index the original data.
+
+<sup>*</sup><i>When there are gaps in the time series, fewer than `n_out` indices may be returned.</i>  
+<sup>**</sup><i>`parallel` is not supported for `LTTBDownsampler`.</i>
+### Downsampling algorithms 📈
+
+The following downsampling algorithms (classes) are implemented:
+
+| Downsampler | Description | `**kwargs` |
+| ---:| --- |--- |
+| `MinMaxDownsampler` | selects the **min and max** value in each bin | `parallel` |
+| `M4Downsampler` | selects the [**min, max, first and last**](https://dl.acm.org/doi/pdf/10.14778/2732951.2732953) value in each bin | `parallel` |
+| `LTTBDownsampler` | performs the [**Largest Triangle Three Buckets**](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm |
+| `MinMaxLTTBDownsampler` | (*new two-step algorithm 🎉*) first selects `n_out` * `minmax_ratio` **min and max** values, then further reduces these to `n_out` values using the **Largest Triangle Three Buckets** algorithm | `parallel`, `minmax_ratio`<sup>*</sup> |
+
+<sup>*</sup><i>Default value for `minmax_ratio` is 30, which is empirically proven to be a good default. (More details in our upcomming paper)</i>
+
+
+## Limitations & assumptions 🚨
 
 Assumes;
-(i) x-data monotinically increasing (i.e., sorted)
-(ii) no NaNs in the data
+1. `x`-data is (non-strictly) monotonic increasing (i.e., sorted)
+2. no `NaNs` in the data
 
 ---
 
 <p align="center">
 👤 <i>Jeroen Van Der Donckt</i>
-</p>
+</p>