Skip to content

Decide on formula aggregation error schema, and document this in the API specΒ #52

@stefan-brus-frequenz

Description

@stefan-brus-frequenz

What's needed?

Its proposed to enhance the API responses for aggregated microgrid component data by including detailed statuses of each component involved in the aggregation process. This addition involves modifying the ListAggregatedMicrogridComponentsDataResponse and ReceiveAggregatedMicrogridComponentsDataStreamResponse messages to incorporate a new component_statuses field. The ComponentAggregationStatus message will provide information about each component's participation in the aggregation, including whether the data was available, unavailable, or if an error occurred, along with optional error messages. This enhancement aims to improve transparency and allow clients to handle data unavailability and errors more effectively.

Proposed solution

+// Message defining the status of a component in the aggregation process.
+message ComponentAggregationStatus {
+  // The ID of the component.
+  uint64 component_id = 1;
+
+  // The status of the component in the aggregation.
+  enum Status {
+    STATUS_UNSPECIFIED = 0;
+    STATUS_OK = 1; // Data was successfully included.
+    STATUS_DATA_UNAVAILABLE = 2; // Data was unavailable.
+    STATUS_ERROR = 3; // An error occurred.
+  }
+  Status status = 2;
+
+  // Optional error message if an error occurred.
+  string error_message = 3;
+}
+

// Message defining the response format for a request that fetches aggregated historical
// metrics based on custom aggregation formulas.
//
// !!! note
//     At least one formula and metric must have been specified in the corresponding request.
//     The aggregation results for these metrics are returned in the samples field.
//
// !!! example
//     Example output structure is the following:
//     ```
//     results: [
//       {
//         aggregation_config: {
//           microgrid_id: 1,
//           metric: "DC_VOLTAGE_V",
//           aggregation_formula: "avg(3,5,6)"
//         },
//         samples: [
//           { sampled_at: "2023-10-01T00:00:00Z", sample: { value: 220.1 } },
//           { sampled_at: "2023-10-01T00:05:00Z", sample: { value: 215.2 } }
-//         ]
+//         ],
+//         component_statuses: [
+//           {
+//              component_id: 3,
+//              status: STATUS_OK
+//           },
+//           {
+//             component_id: 6,
+//             status: STATUS_DATA_UNAVAILABLE
+//           }
+//         ]
//       },
//       {
//         aggregation_config: {
//           microgrid_id: 2,
//           metric: "DC_CURRENT_A",
//           aggregation_formula: "sum(1,2,3,4)"
//         },
//         samples: [
//           { sampled_at: "2023-10-01T00:00:00Z", sample: { value: 1310.7 } },
//           { sampled_at: "2023-10-01T00:05:00Z", sample: { value: 1422.2 } }
-//         ]
+//        ],
+//        component_statuses: [ ]
//       }
//     ]
//     ```
//
message ListAggregatedMicrogridComponentsDataResponse {
  // Encapsulates the result of aggregating a metric.
  message AggregatedResult {
    // Metric and related formula provided for aggregation.
    AggregationConfig aggregation_config = 1;

    // A list of aggregated metrics.
    repeated SimpleAggregatedMetricValue samples = 2;

+  // Information about the status of each component involved in the aggregation.
+  repeated ComponentAggregationStatus component_statuses = 3;
  }

  // List of aggregated results, each corresponding to a metric and custom aggregation
  // formula.
  //
  // !!! note
  //     Each entry in this list contains the aggregation formula config and the
  //     corresponding aggregated metric samples for the requested timeframe.
  repeated AggregatedResult results = 1;

  // Metadata for pagination, containing the token for the next page of results.
  //
  // !!! note
  //     If `pagination_info` is populated, it implies that more data is available to fetch.
  frequenz.api.common.v1.pagination.PaginationInfo pagination_info = 2;
}

// Message defining the response format for a stream that fetches aggregated real-time metrics
// for the provided custom aggregation formulas.
//
// !!! note
//     The formula and metric must have been specified in the corresponding request.
//     A single aggregated sample for the metric is returned in the sample field. Each message
//     covers a single formula. For multiple formulars provided in the request, expect sequential
//     messages in the stream.
//
// !!! example
//     Given a stream output, a single sample might be:
//     ```
//     {
//       aggregation_config: {
//         microgrid_id: 1,
//         metric: "DC_VOLTAGE_V",
//         aggregation_formula: "avg(1,2,3)"
//       },
//       sample {
//         sampled_at: '2023-10-01T00:00:00Z',
//         sample: { value: 42.5 }
//       }
+//        component_statuses: [
+//           {
+//              component_id: 1,
+//              status: STATUS_OK
+//           },
+//           {
+//              component_id: 2,
+//              status: STATUS_DATA_UNAVAILABLE
+//           },
+//           {
+//              component_id: 3,
+//              status: STATUS_ERROR,
+//              error_message: "Component data retrieval failed due to timeout."
+//           }
+//        ]
//     }
//     ```
//
message ReceiveAggregatedMicrogridComponentsDataStreamResponse {
  // The metric and formula that has been used to aggregate the sample.
  AggregationConfig aggregation_config = 1;

  // Aggregated sample value and corresponding UTC timestamp when it was sampled.
  SimpleAggregatedMetricValue sample = 2;

+  // Information about the status of each component involved in the aggregation.
+  repeated ComponentAggregationStatus component_statuses = 3;
}

Use cases

In complex microgrids, aggregating data server side from multiple components (e.g., batteries, inverters) is crucial for monitoring and analysis. However, there are scenarios where data from one or more components may be unavailable or fail to be retrieved due to various reasons like communication issues, maintenance, or errors.

Current Challenge:

  1. Lack of Visibility: Clients receiving aggregated data have no insight into which components contributed to the aggregation and which did not.
  2. Error Handling: Without detailed information, it's challenging for clients to understand discrepancies in aggregated values or to troubleshoot issues related to missing data.

Alternatives and workarounds

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    part:protoAffects the protocol buffer definition filestype:enhancementNew feature or enhancement visitble to users

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions