You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/hydroserverpy/etl/README.md
+34-29Lines changed: 34 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,7 @@ extractor = LocalFileExtractor(
55
55
56
56
### Transformers
57
57
58
-
Transformers parse the extracted payload into a DataFrame where the first column is `timestamp`and the remaining columns are named by their target datastream ID.
58
+
Transformers parse the extracted payload into a DataFrame with columns `timestamp`, `value`, and `target_id`.
59
59
60
60
**CSV:**
61
61
```python
@@ -71,6 +71,8 @@ transformer = CSVTransformer(
71
71
72
72
Use `identifier_type="index"` to reference columns by 1-based position instead of name:
73
73
```python
74
+
from hydroserverpy.etl.transformers import CSVTransformer
75
+
74
76
transformer = CSVTransformer(
75
77
timestamp_key="1",
76
78
identifier_type="index",
@@ -102,6 +104,8 @@ Every transformer requires a `timestamp_key` identifying the source column that
102
104
|`"custom"`| Parses timestamps using a `strftime`-compatible format string provided via `timestamp_format`. Required when the source timestamps are not in a standard ISO 8601 format. |
103
105
104
106
```python
107
+
from hydroserverpy.etl.transformers import CSVTransformer
108
+
105
109
# Custom format example — timestamps like "01/15/2024 08:30:00"
|`"iana"`| Treats timestamps as naive and applies a named IANA timezone. Strips any embedded offset if present. |`timezone` as a valid IANA name |
122
126
123
127
```python
128
+
from hydroserverpy.etl.transformers import CSVTransformer
129
+
124
130
# Fixed UTC offset — timestamps are in US Mountain Standard Time
125
-
transformer= CSVTransformer(
131
+
utc_transformer= CSVTransformer(
126
132
timestamp_key="datetime",
127
133
timezone_type="offset",
128
134
timezone="-0700",
129
135
)
130
136
131
137
# IANA timezone — handles daylight saving time automatically
132
-
transformer= CSVTransformer(
138
+
iana_transformer= CSVTransformer(
133
139
timestamp_key="datetime",
134
140
timezone_type="iana",
135
141
timezone="America/Denver",
@@ -148,7 +154,7 @@ All timestamps are normalized to UTC before loading regardless of the source tim
148
154
149
155
### Data Mappings
150
156
151
-
Data mappings connect source columns (by name or index) to HydroServer datastream IDs. Each mapping can fan out to multiple target datastreams, and optional data operations can be applied along the way.
157
+
Data mappings connect source columns (by name or index) to HydroServer datastream IDs. Each mapping can fan out to multiple target datastreams, and optional data operations can be applied per target path.
152
158
153
159
```python
154
160
from hydroserverpy.etl.transformers import ETLDataMapping, ETLTargetPath
@@ -171,18 +177,18 @@ data_mappings = [
171
177
172
178
#### Data Operations
173
179
174
-
Target paths can include a sequence of data operations applied to the source values before loading. Supported operations are arithmetic expressions and rating curves.
180
+
Target paths can include a sequence of data operations applied to the source values before loading. Operations are applied in order — the output of each becomes the input of the next. Supported operations are arithmetic expressions, rating curves, and temporal aggregation.
175
181
176
182
**Arithmetic expression** — applies a Python arithmetic expression where `x` represents the source value. Only `+`, `-`, `*`, `/`, numeric literals, and the variable `x` are permitted.
177
183
178
184
```python
185
+
from hydroserverpy.etl.transformers import ETLTargetPath
179
186
from hydroserverpy.etl.operations import ArithmeticExpressionOperation
180
187
181
188
ETLTargetPath(
182
189
target_identifier="<datastream-uuid>",
183
190
data_operations=[
184
191
ArithmeticExpressionOperation(
185
-
type="arithmetic_expression",
186
192
expression="(x - 32) / 1.8", # Fahrenheit to Celsius
187
193
target_identifier="<datastream-uuid>",
188
194
)
@@ -193,44 +199,40 @@ ETLTargetPath(
193
199
**Rating curve** — maps input values to output values using linear interpolation against a two-column CSV lookup table (input, output), retrieved from a URL.
194
200
195
201
```python
202
+
from hydroserverpy.etl.transformers import ETLTargetPath
196
203
from hydroserverpy.etl.operations import RatingCurveDataOperation
Operations are applied in order. The output of each operation becomes the input of the next.
211
-
212
-
### Temporal Aggregation
213
-
214
-
Temporal aggregation is an optional step that reduces the per-observation DataFrame produced by the transformer into period-level summaries before loading. When configured, the same aggregation is applied uniformly to every target series in the pipeline.
216
+
**Temporal aggregation** — reduces per-observation values into period-level summaries. When included, it should be the last operation in the sequence, as it changes the shape of the data from one row per observation to one row per aggregation window.
215
217
216
218
```python
217
-
from hydroserverpy.etl.models import TemporalAggregation
219
+
from hydroserverpy.etl.transformers import ETLTargetPath
220
+
from hydroserverpy.etl.operations import TemporalAggregationOperation
218
221
219
-
aggregation = TemporalAggregation(
220
-
aggregation_statistic="simple_mean",
221
-
aggregation_interval=1,
222
-
aggregation_interval_unit="day",
222
+
ETLTargetPath(
223
+
target_identifier="<datastream-uuid>",
224
+
data_operations=[
225
+
TemporalAggregationOperation(
226
+
aggregation_statistic="simple_mean",
227
+
aggregation_interval=1,
228
+
aggregation_interval_unit="day",
229
+
target_identifier="<datastream-uuid>",
230
+
)
231
+
],
223
232
)
224
233
```
225
234
226
-
Pass it to the transformer at construction time:
227
-
228
-
```python
229
-
transformer = CSVTransformer(
230
-
timestamp_key="datetime",
231
-
temporal_aggregation=aggregation,
232
-
)
233
-
```
235
+
Because temporal aggregation is a per-target operation, different targets fed by the same source can use different statistics, intervals, or timezone alignments independently.
234
236
235
237
#### Aggregation statistic
236
238
@@ -256,37 +258,40 @@ Window boundaries are aligned to local midnight in the configured timezone. The
256
258
|`"iana"`| Local midnight in a named timezone, handling DST automatically |`timezone` as a valid IANA name |
257
259
258
260
```python
261
+
from hydroserverpy.etl.operations import TemporalAggregationOperation
262
+
259
263
# Daily windows aligned to US Mountain Time (UTC-7, DST-aware)
260
-
aggregation = TemporalAggregation(
264
+
TemporalAggregationOperation(
261
265
aggregation_statistic="simple_mean",
262
266
aggregation_interval=1,
263
267
aggregation_interval_unit="day",
264
268
timezone_type="iana",
265
269
timezone="America/Denver",
270
+
target_identifier="<datastream-uuid>",
266
271
)
267
272
268
273
# Daily windows at a fixed offset (no DST adjustment)
269
-
aggregation = TemporalAggregation(
274
+
TemporalAggregationOperation(
270
275
aggregation_statistic="time_weighted_mean",
271
276
aggregation_interval=1,
272
277
aggregation_interval_unit="day",
273
278
timezone_type="offset",
274
279
timezone="-0700",
280
+
target_identifier="<datastream-uuid>",
275
281
)
276
282
```
277
283
278
284
**Window boundary semantics:** Windows run from the local midnight that contains the first observation to the local midnight that contains the last observation. The last observation defines the exclusive upper boundary — observations on that final local day are not aggregated. Ensure your source data extends at least one day past the last period you want included, or that the last observation falls on the day following the final window.
279
285
280
286
Days with no observations are omitted from the output rather than filled with null values.
281
287
282
-
283
288
### Loader
284
289
285
290
```python
286
291
from hydroserverpy import HydroServer
287
292
from hydroserverpy.etl.loaders import HydroServerLoader
0 commit comments