Skip to content

New component: azureencodingextensionΒ #41725

@Fiery-Fenix

Description

@Fiery-Fenix

The purpose and use-cases of the new component

Goal

Create a configurable, maintainable and reusable Encoding Extension for all telemetry data types available via Azure Diagnostic Settings export

Description

Azure Monitor at the moment support export of Logs, Metrics and Traces using Azure Diagnostic Settings to different destination that can be consumed by OpenTelemetry Collector.

Sources of telemetry data exposed via Azure Diagnostic Settings with corresponding OTEL data type

Supported Azure Diagnostic Settings destination with OpenTelemetry Collector receiver components

  • Azure Event Hubs - can be consumed by azureeventhubreceiver (AMQP-based) or kafkareceiver (Kafka-proto based)
  • Azure Storage - can be consument by azureblobreceiver, but this component does not have support for Azure-specific data unmarshaling

Current state of support of Azure-specific telemetry data types

In general exported Azure telemetry data is just a JSON with specific structure for each type of data (and log category as well), so in general it can be consumed as-is, but, in this case no normalization and aligning with OTEL SemConv will happened
There is already some packages and code that is helping with translation of Azure-specific telemetry data to correct OTEL data type with proper Resource and Record Attributes:

  • pkg/translator/azure - is used for Logs and Traces, but Logs Attribute names are not translated to corresponding OTEL SemConv naming (for example httpMethod is to translated to http.request.method)
  • pkg/translator/azurelogs - is used for Logs only and has Logs Attribute name translation into OTEL SemConv (this is WIP now)
  • azureeventhubreceiver - this component has built-in translator for Metrics, at the same time using pkg/translator/azure and pkg/translator/azurelogs for Traces and Logs
  • kafkareceiver - is using pkg/translator/azure for Logs, but missing for any support for Azure Traces or Metrics translation
  • azureblobreceiver - has no Azure-specific telemetry data support at all, only plain JSONs

Problem statement

From the descriptions above I can identify following problem statements with consuming Azure-specific telemetry data:

  • Not all OpenTelemetry receivers, that can actually be used, are supporting consuming Azure-specific telemetry data: azureeventhubreceiver - has all (but not all with proper semconv translation), kafkareceiver - only Logs (also without proper semconv translation), azureblobreceiver - has no support at all
  • Azure-specific translation code is scattered between multiple components (pkg/translator/azure, pkg/translator/azurelogs and azureeventhubreceiver) has duplicate functionality (Logs for example), sometimes lack of tests and has different approaches of translating data into OTEL format
  • Azure-specific translation code is hard to configure, for example support for multiple time formats was actually copy-pasted to multiple components to be enabled (see here and here), while kafkareceiver was missed and doesn't have this functionality at all
  • Some portions of code are duplicated, for example asTimestamp function that parses timestamp has 3 copies: here, here and here
  • Last, but not the least - adding support for Azure-specific telemetry data, for example into azureblobreceiver became quite hard task, because of all points described above

Proposed solution

From my perspective, the most appropriate solution to mitigate mentioned issues - is to create Azure Encoding Extension, that will:

  • Consolidate all relevant code in single component, instead of 3 different. This will greatly increase maintainability of this code
  • Defines single approach in translating Azure-specific telemetry data into OTEL telemetry data. This already happens with Logs in pkg/translator/azurelogs, but I believe it should be relevant for other data types as well - resulting OTEL telemetry data should conform SemConv as much as possible
  • Greatly decrease amount of required breaking changes for other involved components. New component could be developed independently from other components and only after stabilizing introduce single breaking change for migration
  • Make Azure-specific translation code more flexible by providing additional configuration options, that doesn't impact receivers that is actually using this code
  • Allows to use Azure-specific translation code in components that has/will support encoding extensions, which covers more specific use-cases for advanced users

Proposed implementation plan

Important: Most code for proposed Azure Encoding Extension is already present in this repository, but stretched across multiple components. So, actually this is not development from scratch, it's rather smart refactoring of the existing code

  1. Introduce skeleton for the new Azure Encoding Extension, discuss and align with proposed configuration options
  2. Implement Traces and Logs translation as simple usage of existing Unmarshallers in pkg/translator/azure and pkg/translator/azurelogs. Unfortunately Metrics unmarshaller in azureeventhubreceiver is not exported and can't be used in this way, see next step
  3. Copy (to maintain compatibility) Metrics Unmarshaller from azureeventhubreceiver to the new Extension, add missing unit tests and validate that produced result is aligned with OTEL SemConv
  4. Copy (to maintain compatibility) Traces Unmarshaller from pkg/translator/azure, to the new Extension, add missing unit tests and validate that produced result is aligned with OTEL SemConv
  5. Copy (to maintain compatibility) Logs Unmarshaller from pkg/translator/azurelogs to the new Extension
  6. Add missing Azure Resource Logs translator code for specific categories of Azure Logs, which is not translated yet (this is current WIP in pkg/translator/azurelogs)
  7. Release component, at this point it will be available as an option for at least kafkareceiver component
  8. Implement optional support for Encoding Extensions to receiver/azureeventhubreceiver to provide ability to utilize new encoding extension (still no breaking changes BTW)
  9. Deprecate usage of logs.encoding=azure_resource_logs in kafkareceiver and format=azure in azureeventhubreceiver, in favor of usage of azureencodingextension
  10. After deprecation period - remove pkg/translator/azure, pkg/translator/azurelogs and respective unmarshalling code from azureeventhubreceiver (this is the only one breaking change that is required)

Example configuration for the component

extensions:
  azure_encoding:
    logs:
      time_formats: ["01/02/2006 15:04:05", "2006-01-02T15:04:05Z"]
      # other logs-specific settings
    metrics:
      time_formats: ["01/02/2006 15:04:05", "2006-01-02T15:04:05Z"]
      # other logs-specific settings
    traces:
      time_formats: ["01/02/2006 15:04:05", "2006-01-02T15:04:05Z"]
      # other logs-specific settings

receivers:
  kafka/azure:
    encoding: azure_encoding

Telemetry data types supported

traces, metrics and logs

Code Owner(s)

@constanca-m, @zmoog (eventually, pending membership), @Fiery-Fenix (eventually, pending membership), others welcome

Sponsor (optional)

@axw

Additional context

There are existing issues that somehow will be fulfilled by the current issue:

I would like to contribute some time to help create this Azure Encoding Extension, as a start of discussion point I have also created initial PR #41708

Pinging current maintainers of pkg/translator/azure, pkg/translator/azurelogs and azureeventhubreceiver as your code is planned to be used for this extension: @atoulme, @cparkins, @MikeGoldsmith, @constanca-m

As for Sponsorship for this new extension:
Taking in account that code is already in place and will be simple refactored into new component - I would love to see someone of maintainers of existing code (pkg/translator/azure, pkg/translator/azurelogs and azureeventhubreceiver) as a Sponsor, if it required of course

Open for your thoughts

Tip

React with πŸ‘ to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions