-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
part:data-pipelineAffects the data pipelineAffects the data pipelinepriority:highAddress this as soon as possibleAddress this as soon as possibletype:enhancementNew feature or enhancement visitble to usersNew feature or enhancement visitble to users
Milestone
Description
What's needed?
Support for streaming historical data through the data pipeline.
Tasks
- Data sourcing actor support for reading parquet files
- Resampler class support for using
Sampletimestamps to identify resampling windows - Resampler detection of live data to transition to system clock
- Data sourcing actor transition to live data after latest historical data has been streamed
Data sourcing actor support for reading parquet files
-
There should be a single python task that accepts all subscriptions, and reads data from parquet files row-wise, and sends data to channels in a round-robin fashion.
-
When there is any metric that has a time gap, the next available message should be sent out immediately, to inform downstream about the gap, but then, can wait on that stream until the other streams have caught up to it. And if there's no data after the beginning of the gap because of an ongoing outage, it should send out a
Sample(datetime.now(), None).Possible edge cases:
- if a component id was present historically, but not anymore, or if it is a new component with no history.
Resampler class support for using Sample timestamps to identify resampling windows
- When starting up, if incoming messages have historical timestamps, start out by calculating resampling windows by adding the resampling interval uniformly from the first
Samples, wait for data in all channels, and resample and send out as usual.
Resampler detection of live data to transition to system clock
- When the timestamps in the incoming
Samples cross the current time (or app start time, followed by current time, in stages), switch to using the system clock for deciding when to resample.
Data sourcing actor transition to live data after latest historical data has been streamed
- Parquet files are generated once every 5 mins. If the actor and the historical data streaming start 1 min after the last parquet file then we have to wait for the next parquet file to fill that 1 min gap and only then switch to live data. This would also require simultaneous caching of live data as soon as the app starts to be able to pick up from the last parquet file if we can do so seamlessly.
Metadata
Metadata
Labels
part:data-pipelineAffects the data pipelineAffects the data pipelinepriority:highAddress this as soon as possibleAddress this as soon as possibletype:enhancementNew feature or enhancement visitble to usersNew feature or enhancement visitble to users
Type
Projects
Status
To do