Historical data resampling for the data pipeline

### What's needed?

Support for streaming historical data through the data pipeline.

#### _Tasks_

- [ ] Data sourcing actor support for reading parquet files
- [ ] Resampler class support for using `Sample` timestamps to identify resampling windows
- [ ] Resampler detection of live data to transition to system clock
- [ ] Data sourcing actor transition to live data after latest historical data has been streamed

#### Data sourcing actor support for reading parquet files

- There should be a single python task that accepts all subscriptions,  and reads data from parquet files row-wise,  and sends data to channels in a round-robin fashion.

- When there is any metric that has a time gap, the next available message should be sent out immediately, to inform downstream about the gap, but then, can wait on that stream until the other streams have caught up to it.  And if there's no data after the beginning of the gap because of an ongoing outage,  it should send out a `Sample(datetime.now(), None)`.

  _Possible edge cases_:
  - if a component id was present historically, but not anymore,  or if it is a new component with no history.

#### Resampler class support for using `Sample` timestamps to identify resampling windows

- When starting up, if incoming messages have historical timestamps, start out by calculating resampling windows by adding the resampling interval uniformly from the first `Sample`s, wait for data in all channels, and resample and send out as usual.

#### Resampler detection of live data to transition to system clock

- When the timestamps in the incoming `Sample`s cross the current time (or app start time, followed by current time, in stages), switch to using the system clock for deciding when to resample.

#### Data sourcing actor transition to live data after latest historical data has been streamed

- Parquet files are generated once every 5 mins.  If the actor and the historical data streaming start 1 min after the last parquet file then we have to wait for the next parquet file to fill that 1 min gap and only then switch to live data.  This would also require simultaneous caching of live data as soon as the app starts to be able to pick up from the last parquet file if we can do so seamlessly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Historical data resampling for the data pipeline #144

What's needed?

Tasks

Data sourcing actor support for reading parquet files

Resampler class support for using `Sample` timestamps to identify resampling windows

Resampler detection of live data to transition to system clock

Data sourcing actor transition to live data after latest historical data has been streamed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Historical data resampling for the data pipeline #144

Description

What's needed?

Tasks

Data sourcing actor support for reading parquet files

Resampler class support for using Sample timestamps to identify resampling windows

Resampler detection of live data to transition to system clock

Data sourcing actor transition to live data after latest historical data has been streamed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Resampler class support for using `Sample` timestamps to identify resampling windows