You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/hydroserverpy/etl/README.md
+76-5Lines changed: 76 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -115,8 +115,8 @@ transformer = CSVTransformer(
115
115
116
116
|`timezone_type`| Behaviour | Requires |
117
117
|---|---|---|
118
-
|`"utc"` (default) |Treats all timestamps as UTC. | — |
119
-
|`"embedded"`|Reads timezone offset from the timestamp string itself. Falls back to UTC if the timestamps are naive. | — |
118
+
|`None` (default) |Reads timezone offset from the timestamp string itself. Falls back to UTC if the timestamps are naive. | — |
119
+
|`"utc"`|Treats all timestamps as UTC. | — |
120
120
|`"offset"`| Treats timestamps as naive and applies a fixed UTC offset. Strips any embedded offset if present. |`timezone` in `±HHMM` or `±HH:MM` format |
121
121
|`"iana"`| Treats timestamps as naive and applies a named IANA timezone. Strips any embedded offset if present. |`timezone` as a valid IANA name |
# Embedded offset — timestamps include their own offset, e.g. "2024-01-15T08:30:00-07:00"
138
+
# Embedded offsets — timestamps include their own offset, e.g. "2024-01-15T08:30:00-07:00"
139
+
# Omit timezone_type (or set it to None) to read offsets from the timestamps directly.
139
140
transformer = CSVTransformer(
140
141
timestamp_key="datetime",
141
-
timezone_type="embedded",
142
142
)
143
143
```
144
144
@@ -209,6 +209,77 @@ ETLTargetPath(
209
209
210
210
Operations are applied in order. The output of each operation becomes the input of the next.
211
211
212
+
### Temporal Aggregation
213
+
214
+
Temporal aggregation is an optional step that reduces the per-observation DataFrame produced by the transformer into period-level summaries before loading. When configured, the same aggregation is applied uniformly to every target series in the pipeline.
215
+
216
+
```python
217
+
from hydroserverpy.etl.models import TemporalAggregation
218
+
219
+
aggregation = TemporalAggregation(
220
+
aggregation_statistic="simple_mean",
221
+
aggregation_interval=1,
222
+
aggregation_interval_unit="day",
223
+
)
224
+
```
225
+
226
+
Pass it to the transformer at construction time:
227
+
228
+
```python
229
+
transformer = CSVTransformer(
230
+
timestamp_key="datetime",
231
+
temporal_aggregation=aggregation,
232
+
)
233
+
```
234
+
235
+
#### Aggregation statistic
236
+
237
+
|`aggregation_statistic`| Behaviour |
238
+
|---|---|
239
+
|`"simple_mean"`| Arithmetic mean of all observations within the window. |
240
+
|`"time_weighted_mean"`| Mean weighted by the time between observations, computed via trapezoidal integration. Values at window boundaries are estimated by linear interpolation from the nearest surrounding observations. |
241
+
|`"last_value_of_period"`| The last observation within the window. |
242
+
243
+
#### Aggregation interval
244
+
245
+
`aggregation_interval` (integer, default `1`) and `aggregation_interval_unit` (currently `"day"`) together define the window width. An `aggregation_interval` of `3` with unit `"day"` produces 3-day windows.
246
+
247
+
#### Timezone
248
+
249
+
Window boundaries are aligned to local midnight in the configured timezone. The timezone fields follow the same conventions as the transformer timestamp configuration, with `None` (the default) falling back to UTC-day boundaries.
|`"offset"`| Local midnight at a fixed UTC offset |`timezone` in `±HHMM` or `±HH:MM` format |
256
+
|`"iana"`| Local midnight in a named timezone, handling DST automatically |`timezone` as a valid IANA name |
257
+
258
+
```python
259
+
# Daily windows aligned to US Mountain Time (UTC-7, DST-aware)
260
+
aggregation = TemporalAggregation(
261
+
aggregation_statistic="simple_mean",
262
+
aggregation_interval=1,
263
+
aggregation_interval_unit="day",
264
+
timezone_type="iana",
265
+
timezone="America/Denver",
266
+
)
267
+
268
+
# Daily windows at a fixed offset (no DST adjustment)
269
+
aggregation = TemporalAggregation(
270
+
aggregation_statistic="time_weighted_mean",
271
+
aggregation_interval=1,
272
+
aggregation_interval_unit="day",
273
+
timezone_type="offset",
274
+
timezone="-0700",
275
+
)
276
+
```
277
+
278
+
**Window boundary semantics:** Windows run from the local midnight that contains the first observation to the local midnight that contains the last observation. The last observation defines the exclusive upper boundary — observations on that final local day are not aggregated. Ensure your source data extends at least one day past the last period you want included, or that the last observation falls on the day following the final window.
279
+
280
+
Days with no observations are omitted from the output rather than filled with null values.
281
+
282
+
212
283
### Loader
213
284
214
285
```python
@@ -349,4 +420,4 @@ for target_id, target in context.results.target_results.items():
349
420
| Error | Likely cause |
350
421
|---|---|
351
422
|`Missing datastream IDs: ...`| One or more target datastream UUIDs don't exist on the HydroServer instance |
352
-
|`HydroServer loader failed to retrieve datastream`| A network or authentication error occurred while looking up a datastream |
423
+
|`HydroServer loader failed to retrieve datastream`| A network or authentication error occurred while looking up a datastream |
0 commit comments