Skip to content

Metrics ReportingΒ #485

@DerGut

Description

@DerGut

Feature Request / Improvement

Iceberg's Metrics Reporting API

We've started a discussion about the Metrics Reporting API in apache/iceberg-rust#1466. It's part of the catalog spec and concerns itself with monitoring Iceberg client's accesses to files in object storage, e.g. number of files considered, scanned and skipped during scan planning (and similar ones for commits). These types of metrics are otherwise not visible from the catalog and the metrics reporting API provides a standard interface to aggregate such metrics across clients.

Currently, only Iceberg Java ships with an implementation for metrics reporting. While providing a pluggable interface, it comes with default implementations LoggingMetricsReporter and RestMetricsReporter. The latter is used in combination with REST catalogs and sends recorded metrics over for server-side processing.

Existing Telemetry APIs

On the draft implementation, @sdd raised a good point that we now have other, often more idiomatic interfaces available apache/iceberg-rust#1496 (comment). In Rust for example, we've decided on using the facade metrics which users can back by any exporter they like, offering simple integrations with existing observability systems. In Go, opentelemetry offers similar functionality.

Using existing telemetry APIs, reporting code could look much simpler and backing integrations will be easier (no custom code needed).

Metric Names

Emitting metrics straight from the library will mean we also need to standardize on metric names or implementations could diverge, defeating the idea of a unified way of monitoring Iceberg clients.

I would like to propose a naming system similar to @sdd's PoC comprised of

iceberg.<operation>.<resource>.<count-type>

for example iceberg.scan.data_files.scanned, iceberg.scan.delete_manifests.skipped or iceberg.commit.delete_files.added. Existing metrics can be taken from ScanMetricsResult.java and CommitMetricsResult.java.

Catalog Spec

The Metrics Reporting API is part of the catalog spec which suggests that we should consider implementing it anyway. If we can prove with an experiment that (for example) an opentelemetry exporter can consume a spec-compliant reporter interface, we should be good. If we can't, we need to take this into consideration.
With the spec's API, multiple metrics are bundled together into a single report. This doesn't seem natural for other metrics APIs and could become an implementation burden.


I want to use this issue to:

  1. start a general discussion about metrics reporting in Go because I find it tremendously useful when working with many clients, and would like to contribute such functionality

  2. extend the discussion about following the Java implementation vs. using more idiomatic approaches because I would like to see different implementations moving into a similar direction

    1. find agreement on metric names if we choose this path

See also apache/iceberg-python#474 (comment) for a similar discussion in Python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions