Skip to content

Commit d715a2d

Browse files
Merge pull request #62 from Quantmetry/dev
Dev
2 parents f943b26 + e2204ad commit d715a2d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+3678
-2418
lines changed

.github/workflows/test.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
name: Unit test Qolmat
22

3-
on: [push, pull_request, workflow_dispatch]
3+
on:
4+
push:
5+
branches:
6+
-dev
7+
-main
8+
pull_request:
9+
workflow_dispatch:
410

511
jobs:
612
build-linux:

CONTRIBUTING.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ If you need to use tensorflow, enter the command:
3636

3737
.. code:: sh
3838
39-
$ pip install -e .[tensorflow]
39+
$ pip install -e .[pytorch]
4040
4141
Once the environment is installed, pre-commit is installed, but need to be activated using the following command:
4242

HISTORY.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
=======
22
History
33
=======
4+
5+
0.0.15 (2023-??-??)
6+
-------------------
7+
8+
* Hyperparameters are now optimized in hyperparameters.py, with the maintained module hyperopt
9+
* The Imputer classes do not possess a dictionary attribute anymore, and all list attributes have
10+
been changed into tuple attributes so that all are not immutable
11+
* All the tests from scikit-learn's check_estimator now pass for the class Imputer
12+
* Fix MLP imputer, created a builder for MLP imputer
13+
* Switch tensorflow by pytorch. Change Test, environment, benchmark and imputers for pytorch
14+
* Add new datasets
15+
* Added dcor metrics with a pattern-wise computation on data with missing values
16+
417
0.0.14 (2023-06-14)
518
-------------------
619

README.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ To install directly from the github repository :
6262

6363
Let us start with a basic imputation problem. Here, we generate one-dimensional noisy time series.
6464

65-
.. code:: sh
65+
.. code-block:: python
6666
6767
import matplotlib.pyplot as plt
6868
import numpy as np
@@ -75,7 +75,7 @@ Let us start with a basic imputation problem. Here, we generate one-dimensional
7575
7676
For this demonstration, let us create artificial holes in our dataset.
7777

78-
.. code:: sh
78+
.. code-block:: python
7979
8080
from qolmat.utils.data import add_holes
8181
plt.rcParams.update({'font.size': 18})
@@ -101,7 +101,7 @@ For this demonstration, let us create artificial holes in our dataset.
101101
To impute missing data, there are several methods that can be imported with ``from qolmat.imputations import imputers``.
102102
The creation of an imputation dictionary will enable us to benchmark the various imputations.
103103

104-
.. code:: sh
104+
.. code-block:: python
105105
106106
from sklearn.linear_model import LinearRegression
107107
from qolmat.imputations import imputers
@@ -146,7 +146,7 @@ The creation of an imputation dictionary will enable us to benchmark the various
146146
147147
It is possible to define a parameter dictionary for an imputer with three pieces of information: min, max and type. The aim of the dictionary is to determine the optimal parameters for data imputation. Here, we call this dictionary ``dict_config_opti``.
148148

149-
.. code:: sh
149+
.. code-block:: python
150150
151151
search_params = {
152152
"RPCA_opti": {
@@ -157,7 +157,7 @@ It is possible to define a parameter dictionary for an imputer with three pieces
157157
158158
Then with the comparator function in ``from qolmat.benchmark import comparator``, we can compare the different imputation methods. This **does not use knowledge on missing values**, but it relies data masking instead. For more details on how imputors and comparator work, please see the following `link <https://qolmat.readthedocs.io/en/latest/explanation.html>`_.
159159

160-
.. code:: sh
160+
.. code-block:: python
161161
162162
from qolmat.benchmark import comparator
163163
@@ -175,7 +175,7 @@ Then with the comparator function in ``from qolmat.benchmark import comparator``
175175
176176
We can observe the benchmark results.
177177

178-
.. code:: sh
178+
.. code-block:: python
179179
180180
dfs_imputed = imputer_tsmle.fit_transform(df_with_nan)
181181
@@ -196,7 +196,7 @@ We can observe the benchmark results.
196196

197197
Finally, we keep the best ``TSMLE`` imputor we represent.
198198

199-
.. code:: sh
199+
.. code-block:: python
200200
201201
dfs_imputed = imputer_tsmle.fit_transform(df_with_nan)
202202

environment.ci.yml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@ dependencies:
66
- pip=23.0.1
77
- pip:
88
- codecov
9-
- flake8==6.0.0
10-
- matplotlib==3.6.2
11-
- mypy==1.1.1
12-
- numpydoc==1.5.0
13-
- pytest==7.2.0
14-
- pytest-cov==4.0.0
15-
- pytest-mock==3.10.0
16-
- tensorflow
9+
- flake8
10+
- matplotlib
11+
- mypy
12+
- numpy
13+
- numpydoc
14+
- pytest
15+
- pytest-cov
16+
- pytest-mock
17+
- torch==2.0.1
1718
- -e .

environment.dev.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,18 @@ channels:
55
dependencies:
66
- bump2version=1.0.1
77
- dcor=0.6
8-
- ipykernel=5.1.4
8+
- ipykernel=6.21.0
99
- jupyter=1.0.0
1010
- jupyterlab=1.2.6
1111
- jupytext=1.14.4
12-
- numpy=1.21
12+
- hyperopt=0.2.7
13+
- numpy=1.24.4
1314
- packaging=23.1
1415
- pandas=2.0.1
16+
- python=3.8
1517
- pip=23.0.1
1618
- scipy=1.10.1
1719
- scikit-learn=1.2.2
18-
- scikit-optimize=0.9
1920
- sphinx=6.2.1
2021
- sphinx-gallery=0.13.0
2122
- sphinx_rtd_theme=1.2.0

examples/RPCA.md

Lines changed: 30 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
---
22
jupyter:
33
jupytext:
4+
formats: ipynb,md
45
text_representation:
56
extension: .md
67
format_name: markdown
78
format_version: '1.3'
89
jupytext_version: 1.14.4
910
kernelspec:
10-
display_name: env_qolmat
11+
display_name: Python 3 (ipykernel)
1112
language: python
12-
name: env_qolmat
13+
name: python3
1314
---
1415

1516
```python
@@ -28,33 +29,27 @@ from math import pi
2829
from qolmat.utils import plot, data
2930
from qolmat.imputations.rpca.rpca_pcp import RPCAPCP
3031
from qolmat.imputations.rpca.rpca_noisy import RPCANoisy
32+
from qolmat.utils.data import generate_artificial_ts
3133
```
3234

3335
**Generate synthetic data**
3436

3537
```python
3638
n_samples = 1000
39+
periods = [100, 20]
40+
amp_anomalies = 0.5
41+
ratio_anomalies = 0.05
42+
amp_noise = 0.1
3743

38-
mesh = np.arange(n_samples)
39-
X_true = np.zeros(n_samples)
40-
A_true = np.zeros(n_samples)
41-
E_true = np.zeros(n_samples)
42-
p1 = 100
43-
p2 = 20
44-
X_true = 1 + np.sin(2 * pi * mesh / p1) + np.sin(2 * pi * mesh / p2)
45-
noise = np.random.uniform(size=n_samples)
46-
amplitude_A = .5
47-
freq_A = .05
48-
A_true = amplitude_A * np.where(noise < freq_A, -np.log(noise), 0) * (2 * (np.random.uniform(size=n_samples) > .5) - 1)
49-
amplitude_E = .1
50-
E_true = amplitude_E * np.random.normal(size=n_samples)
51-
52-
signal = X_true + E_true
53-
signal[A_true != 0] = A_true[A_true != 0]
54-
signal = signal.reshape(-1, 1)
44+
X_true, A_true, E_true = generate_artificial_ts(n_samples, periods, amp_anomalies, ratio_anomalies, amp_noise)
45+
46+
signal = X_true + A_true + E_true
5547

5648
# Adding missing data
57-
signal[5:20, 0] = np.nan
49+
#signal[5:20] = np.nan
50+
mask = np.random.choice(len(signal), round(len(signal) / 20))
51+
signal[mask] = np.nan
52+
5853
```
5954

6055
```python
@@ -73,7 +68,7 @@ plt.plot(E_true)
7368

7469
ax = fig.add_subplot(4, 1, 4)
7570
ax.title.set_text("Corrupted signal")
76-
plt.plot(signal[:, 0])
71+
plt.plot(signal)
7772

7873
plt.show()
7974
```
@@ -82,26 +77,29 @@ plt.show()
8277

8378
```python
8479
%%time
80+
rpca_pcp = RPCAPCP(period=100, max_iterations=100, mu=.5, lam=0.1)
81+
X, A = rpca_pcp.decompose_rpca_signal(signal)
82+
imputed = signal - A
83+
```
8584

86-
rpca_pcp = RPCAPCP(period=100, max_iter=5, mu=.5, lam=1)
87-
X = rpca_pcp.fit_transform(signal)
88-
corruptions = signal - X
85+
```python
86+
fig = plt.figure(figsize=(12, 4))
87+
plt.plot(X, color="black")
88+
plt.plot(imputed)
8989
```
9090

9191
## Temporal RPCA
9292

9393
```python
94-
rpca_noisy = RPCANoisy(period=10, tau=2, lam=0.3, list_periods=[10], list_etas=[0.01], norm="L2")
95-
X = rpca_noisy.fit_transform(signal)
96-
corruptions = signal - X
97-
plot.plot_signal([signal[:,0], X[:,0], corruptions[:, 0]])
94+
%%time
95+
rpca_noisy = RPCANoisy(period=10, tau=1, lam=0.4, list_periods=[10], list_etas=[0.01], norm="L2")
96+
X, A = rpca_noisy.decompose_rpca_signal(signal)
9897
```
9998

10099
```python
101-
rpca_noisy = RPCANoisy(period=10, tau=2, lam=0.3, list_periods=[], list_etas=[], norm="L2")
102-
X = rpca_noisy.fit_transform(signal)
103-
corruptions = signal - X
104-
plot.plot_signal([signal[:,0], X[:,0], corruptions[:, 0]])
100+
fig = plt.figure(figsize=(12, 4))
101+
plt.plot(X, color="black")
102+
plt.plot(imputed)
105103
```
106104

107105
```python

0 commit comments

Comments
 (0)