Skip to content

Make the resampler class not use channels to ingest data #303

@leandro-lucarella-frequenz

Description

What's needed?

There are cases when using the Resampler class where the input data is not received via a channel. In this cases just creating a channel to be able to feed the Resampler is inconvenient and inefficient.

Proposed solution

There are several possible solutions that I can think of:

  1. Move the handling of channels to the resampling actor and have the resampler just take samples by calling a new function, for example: resampler.add_sample(timeseries_name, sample).
  2. Make the Resampler have 2 methods, add_timeseries(name, source) which still takes an async iterator as it is now, and source = add_timeseries(name) which returns a source object where you can send samples via source.add_sample(sample).
  3. Change the resampler to only provide a source = add_timeseries(name) and refactor the _StreamHelper to make it public and take a source returned by the add_timeseries() function, like StreamHelper(source, async_iterator), so each time new data comes from an async iterator (receiving in a task), it will call source.add_sample(sample).

I think 1 is the simplest but less flexible, 3 is the best design because of the flexibility, although a bit more complicated to use, and 2 is in the middle of both.

Looking at this, I'm starting to feel that we should have a resampler that don't work on channels at all, it makes things more complicated (an inefficient) to go through a channel here, where you could have just call resampler.add(sample) when handling the data:

            async for sample in self._resampled_data_recv:
                log.debug("Received new sample: %s", sample)
                if self._resampler:
                    self._reampler.add(sample)
                else:
                    self._buffer.update(sample)

There is already a _ResampleHelper to handle the resampling of only one metric, but it doesn't run its own timer, so we would need to have something that does handle its own timer and ideally can resample multiple metrics.

Use cases

Alternatives and workarounds

  • Create a channel.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    part:data-pipelineAffects the data pipelinepriority:lowThis should be addressed only if there is nothing else on the tabletype:enhancementNew feature or enhancement visitble to users

    Type

    No type

    Projects

    Status

    To do

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions