Skip to content

EPIC - Time Grid #1057

@jawache

Description

@jawache

Sub of #1025

Tasks:

  • Look at a deeper level to understand why using time sync in different places produces different results
  • Document the explanation / best practice

Terminology

  • Time Series - A set of observations that can have duplicate entries for the same time and duration.
  • Unique Time Series - A set of observations with only one entry for each time and duration.
  • Time Grid - A Time Series aligned to a globally configured grid.
  • Time Slot - A single time slot of a given duration.
  • Observation - A set of data that sits in a Time Bucket.
  • Time Syncing - The process of aligning a Time Series to a Time Grid.

Background

Since close to the inception of Impact Framework, we've ensured time is one of the dimensions that the IF reports impacts. We want to surface where it makes sense how the impacts change over time to make sure we can highlight the moments where it's high and allow people to dive into the why.

The problem is that we need to sync up the start, end, durations, and number of observations across each component to ensure everything snaps to a grid so that we can sum up data across all components for every time bucket.

When is having a synced set of observations essential?

  • It's essential for aggregation to function; all observations must be on the time grid.
  • It's essential for a tiny set of plugins, e.g., WattTime. But most plugins don't care about the observations being on a time grid.

Plugins don't need the data snapped to a grid, it's primarily useful for aggregation only.

This is not easy; it has caused several problems and complexities, making writing manifest files unusually hard.

Problem

What determines the time window and durations of a component?

  • The static observations OR the first plugin in the pipeline defines the time window and duration.
  • So, if we want everything on a particular grid, we need to configure every component so the first plugin returns the same time series OR statically add the observations to every component with the same time series.
  • There is the helper TimeSync plugin, but you must be careful where you put it in the pipeline, it can only go in certain places or the manifest errors out - it's tough for the end user to know where to place it in the pipeline intuitively.
  • It is very manual, something you must ensure is aligned for every component. If even one component is off, the whole manifest and aggregation fails.

It's up to the user, component by component, to make sure the observations are snapped to one standard global time grid.

There is a complex relationship between grouping and time-syncing

  • Since TimeSync bleeds from cells next to it, a unique time series is needed for it to work.
  • If a Time Series is not unique, we need to group it into components with a Unique Time Series to run TimeSync on the Unique Time Series.
  • So, grouping must ensure every component has a unique time series for TimeSync to work. We can't just group by whatever makes sense for the end user to group by; grouping also ensures that every component ends up with a unique time series.
  • We DON'T need a unique time-series for aggregation. A time series can have multiple observations for the same time bucket. If all the observations are on the same time-grid, horizontal and vertical syncing can happen.

Grouping is needed for time syncing, and it's not always clear where you have to put the TimeSync plugin, it has to be after grouping, but it might not be immediately after. Time syncing reduces the usefulness of grouping since you can't group it however you want.

Solution

Have the user define a global time grid and make the framework (including plugins) responsible for automatically aligning all the time series to that time grid.

We define a global time-grid setting where we define the start, end, and window. start and end can be hardcoded date times or relative offsets, e.g., start = now, end can be 30 mins ago, and window can be 60.

We start by making sure everything is snapped to the globally configured time grid:

  • By default, the framework passes a set of observations snapped to that time grid to each component, even if the component has no configured inputs.
  • If the component has static data already configured, and that static data is not snapped to that time grid, the framework errors out.
  • So, straight away, the starting point for all components is a set of observations snapped to a time grid.

Plugins are responsible for ensuring that any returning observations are snapped to that global time grid:

  • If a plugin returns observations that are not snapped to that time grid, the framework errors out
  • Importer plugins can look at their inputs to see the start, end, and window and use that information to determine the structure of the outputs it returns.
  • The WattTime plugin already uses the input data to decide what to return, so this solves that problem.

We provide plugin authors some support in helping them align their time series to the time grid:

  • Provide a TimeGrid utility function to every plugin, which they can use to snap data to that grid if it isn't.
  • We can configure a plugin in the initialize section with snap-to-time-grid: true, and then we would automatically run time-sync on the output.
    • This is also useful for backward compatibility, we can make all existing plugins work int the new arch by running TimeSync on their outputs.
  • Or a plugin with their returned config can tell IF to snap-to-time-grid: true, but again, that would only work if the plugin returns a unique time series.

How does aggregation work in a world where all observations are guaranteed to snap to a time-grid?

For example, a component does not have a unique time series.

  • T0 is the 0'index cell in the Time Grid.
  • There can be three observations at the T0 cell for the same component
  • This means this component does NOT have a unique time series, but it doesn't matter since we just aggregate all T0 cells up to the parent grouping node.

6bca1e2c3bf018391f0969b43f108244

E.g. Component has a sparse time series

The component is missing some data in cell, that's ok - we are just summing up every cell it does have, up the tree.

E.g. Component represents servers spinning up and then down.

9bcd1a52235099b0999adc39a3696cb8

  • This is an example of a sparse, non-unique-timeseries.
  • I imagine importers will often return sparse non-unique-timeseries like these, i.e. if you import all VM data for a Kubernetes cluster, you will import data that represents servers spinning up and spinning down.
  • The pressure to make sense of this data should now be put into the importers themselves instead of to the watchers and the other plugins in the framework.

91ddf05480c83bd4cb3b1f27b5187fe9

Advantages of this approach

  • All observations become atomic through this approach, and watchers don't need to care about TimeSync'ng. It takes the pressure off the end user, they don't have to worry about time syncing, it's implicit. Some plugins might need to work harder but that's ok.
  • Plugins are the best ones for figuring out how to TimeSync their own data. We don't need to maintain a units.yml to provide the data needed to understand how to TimeSync.
  • You could do re-grouping without re-computation. You could even do zero grouping, one single component for every server in your fleet, and still get a time series.
  • We return to just one "pipeline" followed by an optional grouping.
  • The time ID (T0, T1, index of the TimeGrid) connects back to the visualizer.

Metadata

Metadata

Assignees

Labels

EPICUsed to denote an issue that represents a whole epic. Core team only

Type

No type

Projects

Status

Pending Review

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions