Skip to content

Commit 9b83266

Browse files
authored
Create sigpro_pipeline_demo.md
add markdown demo to tutorials folder
1 parent cbbc6cc commit 9b83266

File tree

1 file changed

+299
-0
lines changed

1 file changed

+299
-0
lines changed

tutorials/sigpro_pipeline_demo.md

Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
# Processing Signals with Pipelines
2+
3+
Now that we have identified and/or generated several primitives for our signal feature generation, we would like to define a reusable *pipeline* for doing so.
4+
5+
First, let's import the required libraries and functions.
6+
7+
8+
9+
```python
10+
import sigpro
11+
import numpy as np
12+
import pandas as pd
13+
from matplotlib import pyplot as plt
14+
from sigpro.demo import _load_demo as get_demo
15+
```
16+
17+
18+
## Defining Primitives
19+
20+
Recall that we can obtain the list of available primitives with the `get_primitives` method:
21+
22+
23+
24+
```python
25+
from sigpro import get_primitives
26+
27+
get_primitives()
28+
```
29+
30+
31+
32+
33+
['sigpro.SigPro',
34+
'sigpro.aggregations.amplitude.statistical.crest_factor',
35+
'sigpro.aggregations.amplitude.statistical.kurtosis',
36+
'sigpro.aggregations.amplitude.statistical.mean',
37+
'sigpro.aggregations.amplitude.statistical.rms',
38+
'sigpro.aggregations.amplitude.statistical.skew',
39+
'sigpro.aggregations.amplitude.statistical.std',
40+
'sigpro.aggregations.amplitude.statistical.var',
41+
'sigpro.aggregations.frequency.band.band_mean',
42+
'sigpro.transformations.amplitude.identity.identity',
43+
'sigpro.transformations.amplitude.spectrum.power_spectrum',
44+
'sigpro.transformations.frequency.band.frequency_band',
45+
'sigpro.transformations.frequency.fft.fft',
46+
'sigpro.transformations.frequency.fft.fft_real',
47+
'sigpro.transformations.frequency_time.stft.stft',
48+
'sigpro.transformations.frequency_time.stft.stft_real']
49+
50+
51+
52+
In addition, we can also define our own custom primitives.
53+
54+
## Building a Pipeline
55+
56+
Let’s go ahead and define a feature processing pipeline that sequentially applies the `identity`and `fft` transformations before applying the `std` aggregation. To pass these primitives into the signal processor, we must write each primitive as a dictionary with the following fields:
57+
58+
- `name`: Name of the transformation / aggregation.
59+
- `primitive`: Name of the primitive to apply.
60+
- `init_params`: Dictionary containing the initializing parameters for the primitive. *
61+
62+
Since we choose not to specify any initial parameters, we do not set `init_params` in these dictionaries.
63+
64+
65+
```python
66+
identity_transform = {'name': 'identity1',
67+
'primitive': 'sigpro.transformations.amplitude.identity.identity'}
68+
69+
fft_transform = {'name': 'fft1',
70+
'primitive': 'sigpro.transformations.frequency.fft.fft'}
71+
72+
std_agg = {'name': 'std1',
73+
'primitive': "sigpro.aggregations.amplitude.statistical.std"}
74+
```
75+
76+
77+
We now define a new pipeline containing the primitives we would like to apply. At minimum, we will need to pass in a list of transformations and a list of aggregations; the full list of available arguments is given below.
78+
79+
- Inputs:
80+
- `transformations (list)` : List of dictionaries containing the transformation primitives.
81+
- `aggregations (list)`: List of dictionaries containing the aggregation primitives.
82+
- `values_column_name (str)`(optional):The name of the column that contains the signal values. Defaults to `'values'`.
83+
- `keep_columns (Union[bool, list])` (optional): Whether to keep non-feature columns in the output DataFrame or not. If a list of column names are passed, those columns are kept. Defaults to `False`.
84+
- `input_is_dataframe (bool)` (optional): Whether the input is a pandas Dataframe. Defaults to `True`.
85+
86+
Returning to the example:
87+
88+
89+
```python
90+
transformations = [identity_transform, fft_transform]
91+
92+
aggregations = [std_agg]
93+
94+
mypipeline = sigpro.SigPro(transformations, aggregations, values_column_name = 'yvalues', keep_columns = True)
95+
```
96+
97+
98+
SigPro will proceed to build an `MLPipeline` that can be reused to build features.
99+
100+
To check that `mypipeline` was defined correctly, we can check the input and output arguments with the `get_input_args` and `get_output_args` methods.
101+
102+
103+
```python
104+
input_args = mypipeline.get_input_args()
105+
output_args = mypipeline.get_output_args()
106+
107+
print(input_args)
108+
print(output_args)
109+
```
110+
111+
[{'name': 'readings', 'keyword': 'data', 'type': 'pandas.DataFrame'}, {'name': 'feature_columns', 'default': None, 'type': 'list'}]
112+
[{'name': 'readings', 'type': 'pandas.DataFrame'}, {'name': 'feature_columns', 'type': 'list'}]
113+
114+
115+
## Applying a Pipeline with `process_signal`
116+
117+
Once our pipeline is correctly defined, we apply the `process_signal` method to a demo dataset. Recall that `process_signal` is defined as follows:
118+
119+
120+
```python
121+
def process_signal(self, data=None, window=None, time_index=None, groupby_index=None,
122+
feature_columns=None, **kwargs):
123+
124+
...
125+
return data, feature_columns
126+
```
127+
128+
`process_signal` accepts as input the following arguments:
129+
130+
- `data (pd.Dataframe)` : Dataframe with a column containing signal values.
131+
- `window (str)`: Duration of window size, e.g. ('1h').
132+
- `time_index (str)`: Name of column in `data` that represents the time index.
133+
- `groupby_index (str or list[str])`: List of column names to group together and take the window over.
134+
- `feature_columns (list)`: List of columns from the input data that should be considered as features (and not dropped).
135+
136+
`process_signal` outputs the following:
137+
138+
- `data (pd.Dataframe)`: Dataframe containing output feature values as constructed from the signal
139+
- `feature_columns (list)`: list of (generated) feature names.
140+
141+
We now apply our pipeline to a toy dataset. We define our toy dataset as follows:
142+
143+
144+
```python
145+
demo_dataset = get_demo()
146+
demo_dataset.columns = ['turbine_id', 'signal_id', 'xvalues', 'yvalues', 'sampling_frequency']
147+
demo_dataset.head()
148+
```
149+
150+
151+
152+
153+
<div>
154+
155+
<table border="1" class="dataframe">
156+
<thead>
157+
<tr style="text-align: right;">
158+
<th></th>
159+
<th>turbine_id</th>
160+
<th>signal_id</th>
161+
<th>xvalues</th>
162+
<th>yvalues</th>
163+
<th>sampling_frequency</th>
164+
</tr>
165+
</thead>
166+
<tbody>
167+
<tr>
168+
<th>0</th>
169+
<td>T001</td>
170+
<td>Sensor1_signal1</td>
171+
<td>2020-01-01 00:00:00</td>
172+
<td>[0.43616983763682876, -0.17662312586241055, 0....</td>
173+
<td>1000</td>
174+
</tr>
175+
<tr>
176+
<th>1</th>
177+
<td>T001</td>
178+
<td>Sensor1_signal1</td>
179+
<td>2020-01-01 01:00:00</td>
180+
<td>[0.8023828754411122, -0.14122063493312714, -0....</td>
181+
<td>1000</td>
182+
</tr>
183+
<tr>
184+
<th>2</th>
185+
<td>T001</td>
186+
<td>Sensor1_signal1</td>
187+
<td>2020-01-01 02:00:00</td>
188+
<td>[-1.3143142430046044, -1.1055740033788437, -0....</td>
189+
<td>1000</td>
190+
</tr>
191+
<tr>
192+
<th>3</th>
193+
<td>T001</td>
194+
<td>Sensor1_signal1</td>
195+
<td>2020-01-01 03:00:00</td>
196+
<td>[-0.45981995520032104, -0.3255426061995603, -0...</td>
197+
<td>1000</td>
198+
</tr>
199+
<tr>
200+
<th>4</th>
201+
<td>T001</td>
202+
<td>Sensor1_signal1</td>
203+
<td>2020-01-01 04:00:00</td>
204+
<td>[-0.6380405111460377, -0.11924167777027689, 0....</td>
205+
<td>1000</td>
206+
</tr>
207+
</tbody>
208+
</table>
209+
</div>
210+
211+
212+
213+
Finally, we apply the `process_signal` method of our previously defined pipeline:
214+
215+
216+
```python
217+
processed_data, feature_columns = mypipeline.process_signal(demo_dataset, time_index = 'xvalues')
218+
219+
processed_data.head()
220+
221+
```
222+
223+
224+
225+
226+
<div>
227+
228+
<table border="1" class="dataframe">
229+
<thead>
230+
<tr style="text-align: right;">
231+
<th></th>
232+
<th>turbine_id</th>
233+
<th>signal_id</th>
234+
<th>xvalues</th>
235+
<th>yvalues</th>
236+
<th>sampling_frequency</th>
237+
<th>identity1.fft1.std1.std_value</th>
238+
</tr>
239+
</thead>
240+
<tbody>
241+
<tr>
242+
<th>0</th>
243+
<td>T001</td>
244+
<td>Sensor1_signal1</td>
245+
<td>2020-01-01 00:00:00</td>
246+
<td>[0.43616983763682876, -0.17662312586241055, 0....</td>
247+
<td>1000</td>
248+
<td>14.444991</td>
249+
</tr>
250+
<tr>
251+
<th>1</th>
252+
<td>T001</td>
253+
<td>Sensor1_signal1</td>
254+
<td>2020-01-01 01:00:00</td>
255+
<td>[0.8023828754411122, -0.14122063493312714, -0....</td>
256+
<td>1000</td>
257+
<td>12.326223</td>
258+
</tr>
259+
<tr>
260+
<th>2</th>
261+
<td>T001</td>
262+
<td>Sensor1_signal1</td>
263+
<td>2020-01-01 02:00:00</td>
264+
<td>[-1.3143142430046044, -1.1055740033788437, -0....</td>
265+
<td>1000</td>
266+
<td>12.051415</td>
267+
</tr>
268+
<tr>
269+
<th>3</th>
270+
<td>T001</td>
271+
<td>Sensor1_signal1</td>
272+
<td>2020-01-01 03:00:00</td>
273+
<td>[-0.45981995520032104, -0.3255426061995603, -0...</td>
274+
<td>1000</td>
275+
<td>10.657243</td>
276+
</tr>
277+
<tr>
278+
<th>4</th>
279+
<td>T001</td>
280+
<td>Sensor1_signal1</td>
281+
<td>2020-01-01 04:00:00</td>
282+
<td>[-0.6380405111460377, -0.11924167777027689, 0....</td>
283+
<td>1000</td>
284+
<td>12.640728</td>
285+
</tr>
286+
</tbody>
287+
</table>
288+
</div>
289+
290+
291+
292+
293+
Success! We have managed to apply the primitives to generate features on the input dataset.
294+
295+
296+
297+
```python
298+
299+
```

0 commit comments

Comments
 (0)