Add per catalog metastore metrics to QueryStats #26900

lukasz-stec · 2025-10-09T15:32:22Z

Description

Slow metastore can be a root cause of slow analysis or planning. This adds explicit metrics to the QueryStats with remote metastore call stats made for a given query.

This is what this looks like in the query.json:

{
  "queryStats": {
    ...
    "catalogMetadataMetrics": {
      "hive": {
        "metastore.getDatabase.time.distribution": {
          "@class": "io.trino.plugin.base.metrics.DistributionSnapshot",
          "total": 1,
          "min": 281684250,
          "max": 281684250,
          "p01": 281684250,
          "p05": 281684250,
          "p10": 281684250,
          "p25": 281684250,
          "p50": 281684250,
          "p75": 281684250,
          "p90": 281684250,
          "p95": 281684250,
          "p99": 281684250
        },
        "metastore.getTable.time.distribution": {
          "@class": "io.trino.plugin.base.metrics.DistributionSnapshot",
          "total": 1,
          "min": 22330833,
          "max": 22330833,
          "p01": 22330833,
          "p05": 22330833,
          "p10": 22330833,
          "p25": 22330833,
          "p50": 22330833,
          "p75": 22330833,
          "p90": 22330833,
          "p95": 22330833,
          "p99": 22330833
        },
        "metastore.all.time.distribution": {
          "@class": "io.trino.plugin.base.metrics.DistributionSnapshot",
          "total": 2,
          "min": 22330833,
          "max": 281684250,
          "p01": 22330833,
          "p05": 22330833,
          "p10": 22330833,
          "p25": 22330833,
          "p50": 281684250,
          "p75": 281684250,
          "p90": 281684250,
          "p95": 281684250,
          "p99": 281684250
        },
        "metastore.getDatabase.time.total": {
          "@class": "io.trino.plugin.base.metrics.LongCount",
          "total": 281684250
        },
        "metastore.all.time.total": {
          "@class": "io.trino.plugin.base.metrics.LongCount",
          "total": 304015083
        },
        "metastore.getTable.time.total": {
          "@class": "io.trino.plugin.base.metrics.LongCount",
          "total": 22330833
        }
      }
    },
    ...
  }
}

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X) Release notes are required, with the following suggested text:

## Section
* Add query metastore stats to the `QueryInfo` for hive and iceberg connectors.

Summary by Sourcery

Add per-catalog metastore metrics to QueryStats by extending the Metadata API and collecting metrics from each active catalog on query completion or failure, instrument connectors with MeasuredHiveMetastore, and verify behavior with new connector tests.

New Features:

Expose per-catalog metastore call metrics in QueryStats and QueryInfo
Instrument Hive and Iceberg connectors with MeasuredHiveMetastore to record per-method API call durations and failures

Enhancements:

Extend Metadata SPI and implementations (TracingMetadata, MetadataManager) to support getMetrics and listActiveCatalogs
Collect and attach catalog metadata metrics in QueryStateMachine upon query completion or failure
Refactor QueryStateMachineBuilder to use a beforeQueryCleanup hook instead of custom Metadata wrapper

Build:

Add dependencies on io.airlift.stats and trino-plugin-toolkit in trino-metastore pom.xml

Tests:

Add tests in BaseHiveConnectorTest and BaseIcebergConnectorTest verifying metastore metrics are present for both successful and timed-out queries
Update existing tests and fixtures to account for the new catalogMetadataMetrics field and DistributionSnapshot class

Summary by Sourcery

Add per-catalog metastore metrics tracking to QueryStats by instrumenting connectors and extending Metadata APIs to gather and report connector-specific metadata call metrics on query finish or failure.

New Features:

Expose per-catalog metastore call metrics in QueryStats and QueryInfo by collecting connector metadata metrics on query completion or failure
Instrument Hive and Iceberg connectors with MeasuredHiveMetastore to record per-method metastore API call durations and failures

Enhancements:

Extend Metadata SPI and core implementations to support getMetrics and listActiveCatalogs
Collect and attach catalog metadata metrics in QueryStateMachine via a beforeQueryCleanup hook

Build:

Add dependencies on io.airlift.stats and trino-plugin-toolkit to support new metrics types

Tests:

Add connector tests in BaseHiveConnectorTest and BaseIcebergConnectorTest to verify catalog metadata metrics for both successful and timed-out queries
Update core tests and fixtures to account for the new catalogMetadataMetrics field in QueryStats and DistributionSnapshot class

sourcery-ai · 2025-10-09T15:32:31Z

Reviewer's Guide

This PR extends the metadata API to collect and expose per-catalog metastore call metrics in QueryStats and QueryInfo. It adds SPI methods for listing active catalogs and fetching connector metrics, wraps Hive metastore calls with timing and failure counting, integrates metrics capture into QueryStateMachine on query completion or failure, and updates connectors and tests to support and validate the new metrics field.

Class diagram for MeasuredHiveMetastore and metastore metrics integration

classDiagram
    class MeasuredHiveMetastore {
        -HiveMetastore delegate
        -MetastoreApiCallStats allApiCallsStats
        -Map<String, MetastoreApiCallStats> apiCallStats
        -Ticker ticker
        +Metrics getMetrics()
        +<all HiveMetastore methods> (wrapped)
    }
    class MetastoreApiCallStats {
        -TDigest timeNanosDistribution
        -long totalTimeNanos
        -long totalFailures
        +addTime(long)
        +addFailure()
        +put(ImmutableMap.Builder<String, Metric<?>>, String)
    }
    class MeasuredMetastoreFactory {
        -HiveMetastoreFactory metastoreFactory
        +createMetastore(Optional<ConnectorIdentity>)
        +isImpersonationEnabled()
    }
    MeasuredHiveMetastore --> HiveMetastore : delegates
    MeasuredHiveMetastore --> MetastoreApiCallStats : uses
    MeasuredMetastoreFactory --> MeasuredHiveMetastore : creates
    MeasuredMetastoreFactory --> HiveMetastoreFactory : delegates

    class HiveMetastore {
        <<interface>>
        +getMetrics() : Metrics
        +<other methods>
    }
    MeasuredHiveMetastore ..|> HiveMetastore

    class Metrics {
        +Map<String, Metric<?>> metrics
    }
    MeasuredHiveMetastore --> Metrics : returns
    MetastoreApiCallStats --> Metric : builds

    class Metric {
        <<interface>>
    }

Class diagram for QueryStats and catalogMetadataMetrics field

classDiagram
    class QueryStats {
        +Map<String, Metrics> catalogMetadataMetrics
        +getCatalogMetadataMetrics()
        +<other fields and methods>
    }
    QueryStats --> Metrics : contains

    class Metrics {
        +Map<String, Metric<?>> metrics
    }
    Metrics --> Metric : contains

    class Metric {
        <<interface>>
    }

File-Level Changes

Change	Details	Files
Extend Metadata SPI to support connector-specific metrics and active catalog listing	Add getMetrics and listActiveCatalogs methods to Metadata and ConnectorMetadata interfaces Implement new SPI methods in MetadataManager, TracingMetadata, TracingConnectorMetadata, HiveMetadata, IcebergMetadata, LakehouseMetadata, and classloader-safe wrappers Propagate SPI changes through core and plugin code	`core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java` `core/trino-main/src/main/java/io/trino/metadata/Metadata.java` `core/trino-main/src/main/java/io/trino/metadata/MetadataManager.java` `core/trino-main/src/main/java/io/trino/tracing/TracingMetadata.java` `core/trino-main/src/main/java/io/trino/tracing/TracingConnectorMetadata.java` `plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java` `plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java` `plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseMetadata.java` `lib/trino-plugin-toolkit/src/main/java/io/trino/plugin/base/classloader/ClassLoaderSafeConnectorMetadata.java`
Instrument QueryStateMachine to collect and serialize per-catalog metrics	Add catalogMetadataMetrics field and collectCatalogMetadataMetrics method Invoke metrics collection in transitionToFinishing and transitionToFailed Include catalogMetadataMetrics in QueryStats and JSON serialization	`core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java` `core/trino-main/src/main/java/io/trino/execution/QueryStats.java`
Introduce MeasuredHiveMetastore wrapper for capturing metastore API call statistics	Create MeasuredHiveMetastore and MetastoreApiCallStats to record timings and failures Wrap original HiveMetastoreFactory with MeasuredMetastoreFactory in CachingHiveMetastoreModule Implement getMetrics in CachingHiveMetastore and TracingHiveMetastore	`lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java` `plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/CachingHiveMetastoreModule.java` `lib/trino-metastore/src/main/java/io/trino/metastore/cache/CachingHiveMetastore.java` `lib/trino-metastore/src/main/java/io/trino/metastore/tracing/TracingHiveMetastore.java`
Refactor QueryStateMachineBuilder for beforeQueryCleanup hook	Replace withMetadata override in tests with beforeQueryCleanup callback Update TestQueryStateMachine builder logic	`core/trino-main/src/test/java/io/trino/execution/TestQueryStateMachine.java`
Update tests and dependencies to validate new metrics	Add catalogMetadataMetrics assertions in Hive and Iceberg connector tests Update system and information schema connector tests to ignore getMetrics calls Add getCatalogMetadataMetrics helper and adjust TestQueryStats/TestStageStats/TestQueryInfo constructors Add dependencies on io.airlift.stats and trino-plugin-toolkit in metastore POM	`plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java` `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java` `testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java` `core/trino-main/src/test/java/io/trino/execution/TestQueryStats.java` `core/trino-main/src/test/java/io/trino/execution/TestStageStats.java` `core/trino-main/src/test/java/io/trino/execution/TestQueryInfo.java` `core/trino-main/src/test/java/io/trino/server/TestBasicQueryInfo.java` `testing/trino-tests/src/test/java/io/trino/connector/informationschema/TestInformationSchemaConnector.java` `testing/trino-tests/src/test/java/io/trino/connector/system/metadata/TestSystemMetadataConnector.java` `lib/trino-metastore/pom.xml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consolidate the duplicated assertCountMetricExists/assertDistributionMetricExists helpers in BaseHiveConnectorTest and BaseIcebergConnectorTest into a shared test utility to reduce code duplication.
Centralize the collectCatalogMetadataMetrics invocation in QueryStateMachine (rather than calling it separately in both transitionToFinishing and transitionToFailed) to DRY up the code and ensure consistency.
Consider refactoring the very large MeasuredHiveMetastore class by extracting the stats‐collection logic into smaller, focused components or utility classes to improve readability and maintainability.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consolidate the duplicated assertCountMetricExists/assertDistributionMetricExists helpers in BaseHiveConnectorTest and BaseIcebergConnectorTest into a shared test utility to reduce code duplication.
- Centralize the collectCatalogMetadataMetrics invocation in QueryStateMachine (rather than calling it separately in both transitionToFinishing and transitionToFailed) to DRY up the code and ensure consistency.
- Consider refactoring the very large MeasuredHiveMetastore class by extracting the stats‐collection logic into smaller, focused components or utility classes to improve readability and maintainability.

## Individual Comments

### Comment 1
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java:9225-9226` </location>
<code_context>
         assertQuerySucceeds("CALL system.flush_metadata_cache()");
     }

+    @Test
+    public void testCatalogMetadataMetrics()
+    {
+        MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
</code_context>

<issue_to_address>
**suggestion (testing):** Missing test for metrics with multiple catalogs.

Please add a test that runs a query across multiple catalogs to ensure metrics are tracked separately for each.

Suggested implementation:

```java
    @Test
    public void testCatalogMetadataMetrics()
    {
        MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
                getSession(),
                "SELECT count(*) FROM region r, nation n WHERE r.regionkey = n.regionkey");
        Map<String, Metrics> metrics = getCatalogMetadataMetrics(result.queryId());
        assertCountMetricExists(metrics, "iceberg", "metastore.all.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.all.time.distribution");
        assertCountMetricExists(metrics, "iceberg", "metastore.getTable.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.getTable.time.distribution");
    }

    @Test
    public void testCatalogMetadataMetricsWithMultipleCatalogs()
    {
        // Assume "iceberg" and "tpch" catalogs are available for testing
        MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
                getSession(),
                "SELECT count(*) FROM iceberg.region r JOIN tpch.nation n ON r.regionkey = n.regionkey");
        Map<String, Metrics> metrics = getCatalogMetadataMetrics(result.queryId());

        // Assert metrics for iceberg catalog
        assertCountMetricExists(metrics, "iceberg", "metastore.all.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.all.time.distribution");
        assertCountMetricExists(metrics, "iceberg", "metastore.getTable.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.getTable.time.distribution");

        // Assert metrics for tpch catalog (replace with actual metric names if different)
        assertCountMetricExists(metrics, "tpch", "metastore.all.time.total");
        assertDistributionMetricExists(metrics, "tpch", "metastore.all.time.distribution");
        assertCountMetricExists(metrics, "tpch", "metastore.getTable.time.total");
        assertDistributionMetricExists(metrics, "tpch", "metastore.getTable.time.distribution");
    }

```

- If the "tpch" catalog does not support the same metrics, adjust the metric names or assertions accordingly.
- Ensure that the catalogs "iceberg" and "tpch" are available and configured in your test environment.
- If you use different catalogs, update the catalog names in the test.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

lukasz-stec · 2025-10-10T07:07:13Z

There are related CI failures. Moving to draft until I fix it

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The MeasuredHiveMetastore class manually wraps every HiveMetastore method, which leads to a lot of boilerplate; consider using a dynamic proxy or an abstract base wrapper to automatically instrument all methods and reduce duplication.
The Hive and Iceberg connector tests duplicate the same metric‐assertion logic; extracting the assertCountMetricExists and assertDistributionMetricExists helpers into a shared base test would DRY up the code and centralize metric validation.
Since connectors now rely on the new getMetrics/listActiveCatalogs SPI methods, add a quick check or lint to ensure every connector overrides these (or explicitly opts out) so no catalog is left without metrics by accident.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The MeasuredHiveMetastore class manually wraps every HiveMetastore method, which leads to a lot of boilerplate; consider using a dynamic proxy or an abstract base wrapper to automatically instrument all methods and reduce duplication.
- The Hive and Iceberg connector tests duplicate the same metric‐assertion logic; extracting the `assertCountMetricExists` and `assertDistributionMetricExists` helpers into a shared base test would DRY up the code and centralize metric validation.
- Since connectors now rely on the new getMetrics/listActiveCatalogs SPI methods, add a quick check or lint to ensure every connector overrides these (or explicitly opts out) so no catalog is left without metrics by accident.

## Individual Comments

### Comment 1
<location> `core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java:392-390` </location>
<code_context>
         return queryStateMachine;
     }

+    private void collectCatalogMetadataMetrics()
+    {
+        // collect the metrics only once. This avoid issue with transaction being removed
+        // after the check but before the metrics collection
+        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
+            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {
+                // The metrics collection depends on active transaction as the metrics
+                // are stored in the transactional ConnectorMetadata, but the collection can be
+                // run after the query has failed e.g., via cancel.
+                return;
+            }
+
+            ImmutableMap.Builder<String, Metrics> catalogMetadataMetrics = ImmutableMap.builder();
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider handling exceptions during metrics collection to avoid masking query state transitions.

Unexpected exceptions during metrics collection may interrupt query state transitions. Wrapping this logic in a try-catch and logging errors will improve robustness, especially during transitions like finishing or failure.

Suggested implementation:

```java
    private void collectCatalogMetadataMetrics()
    {
        // collect the metrics only once. This avoid issue with transaction being removed
        // after the check but before the metrics collection
        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {
                // The metrics collection depends on active transaction as the metrics
                // are stored in the transactional ConnectorMetadata, but the collection can be
                // run after the query has failed e.g., via cancel.
                return;
            }

            try {
                ImmutableMap.Builder<String, Metrics> catalogMetadataMetrics = ImmutableMap.builder();
                for (CatalogInfo activeCatalog : metadata.listActiveCatalogs(session)) {
                    Metrics metrics = metadata.getMetrics(session, activeCatalog.catalogName());
                    if (!metrics.getMetrics().isEmpty()) {
                        catalogMetadataMetrics.put(activeCatalog.catalogName(), metrics);
                    }
                }

                this.catalogMetadataMetrics.set(catalogMetadataMetrics.buildOrThrow());
            }
            catch (Exception e) {
                log.error(e, "Error collecting catalog metadata metrics for query %s", queryId);
            }
        }
    }

```

If the logger (`log`) is not already defined in this class, you should add:

```java
private static final Logger log = Logger.get(QueryStateMachine.class);
```

at the top of the class, with the appropriate import:

```java
import io.airlift.log.Logger;
```
</issue_to_address>

### Comment 2
<location> `lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java:46-54` </location>
<code_context>
+    private final HiveMetastore delegate;
+    private final MetastoreApiCallStats allApiCallsStats = new MetastoreApiCallStats();
+    private final Map<String, MetastoreApiCallStats> apiCallStats = new ConcurrentHashMap<>();
+    private final Ticker ticker = Ticker.systemTicker();
+
+    public MeasuredHiveMetastore(HiveMetastore delegate)
</code_context>

<issue_to_address>
**suggestion:** Consider allowing ticker injection for testability.

Injecting the ticker through the constructor would make it easier to test time-dependent logic.

```suggestion
    private final HiveMetastore delegate;
    private final MetastoreApiCallStats allApiCallsStats = new MetastoreApiCallStats();
    private final Map<String, MetastoreApiCallStats> apiCallStats = new ConcurrentHashMap<>();
    private final Ticker ticker;

    public MeasuredHiveMetastore(HiveMetastore delegate)
    {
        this(delegate, Ticker.systemTicker());
    }

    public MeasuredHiveMetastore(HiveMetastore delegate, Ticker ticker)
    {
        this.delegate = requireNonNull(delegate, "delegate is null");
        this.ticker = requireNonNull(ticker, "ticker is null");
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

lukasz-stec · 2025-10-13T07:20:33Z

@findepi @raunaqmorarka This is ready for review. There is one CI failure, but it is unrelated.

findepi · 2025-10-13T12:18:04Z

test (plugin/trino-lakehouse) this job hanged it contains a couple errors like this

Treating the bytes as signed or unsigned?

It's worth a code comment explaining that Trino string codepoint-based collation is equivalent to sorting (unsigned) bytes ... in UTF-8 encoding.
When we sent `COLLATE BINARY` to oracle, are we assuming the data is compared byte-wise on Utf-8 encoding, or can it be something else?

and then

2025-10-10T06:42:14.693-0600	WARN	TestHangMonitor	io.trino.testing.services.junit.LogTestDurationListener	No test started or completed in 8.00m. Running tests:
	TestLakehouseConnectorTest running for 9.33m
	JUnit Jupiter running for 9.41m
	TestLakehouseDeltaConnectorSmokeTest running for 9.41m
	TestLakehouseHiveConnectorSmokeTest running for 9.26m
	TestLakehouseIcebergConnectorSmokeTest running for 9.38m
	TestLakehouseFileConnectorSmokeTest running for 9.41m.

are they related?

lukasz-stec · 2025-10-13T13:41:53Z

test (plugin/trino-lakehouse) this job hanged it contains a couple errors like this
...

are they related?

Yes, the impl for the io.trino.plugin.lakehouse.LakehouseMetadata#getMetrics was broken. I fixed it and added proper tests

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

core/trino-main/src/test/java/io/trino/execution/TestQueryStateMachine.java

...rino-tests/src/test/java/io/trino/connector/system/metadata/TestSystemMetadataConnector.java

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java

Metadata and QueryStateMachine must use the same `TransactionManager` instance.

lukasz-stec

Thanks for the review @findepi ! I addressed the comments

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

lukasz-stec

Missed some comments

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java

lukasz-stec · 2025-10-14T21:06:45Z

@findepi @raunaqmorarka There were issues with collecting the metadata metrics concurrently with updating the final query info. The query info could be triggered asynchronously, and thus, missing the collected metrics. To fix this, I added a metadata metrics collection before every query stats collection. This also makes the metadata metrics available before the query is done, which is an additional benefit.

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

raunaqmorarka · 2025-10-16T08:54:06Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

            transitionToFailed(e);
            return true;
        }
+        collectCatalogMetadataMetrics();


We make metadata calls during FINISHING state as well, so we should collect metrics at the end of that

We remove the transaction right after this call, so there is no way to collect the metrics anymore.

raunaqmorarka · 2025-10-16T08:55:07Z

core/trino-main/src/main/java/io/trino/metadata/Metadata.java

    Optional<Object> getInfo(Session session, TableHandle handle);

+    /**
+     * Return connector-specific, metadata operations metrics for the given session.


connector -> catalog

Well, the metrics are connector-specific, but we return them for the provided catalog.

raunaqmorarka · 2025-10-16T08:57:43Z

core/trino-main/src/main/java/io/trino/tracing/TracingConnectorMetadata.java

+    @Override
+    public Metrics getMetrics(ConnectorSession session)
+    {
+        Span span = startSpan("getMetrics");


Do we really want to have Span for get metrics collection call ?
If we're calling this every time QueryStats is constructed, the traces are going to show this Span a lot

There should be only a few of those calls made + since this is connector specific, we don't really know what is there. I think it is ok to keep it consistent with the rest of the class.

btw I tested this and those spans are not part of the query trace but only as a part of a standalone Metadata.getMetrics trace. Seems like a bug.

I would still suggest skipping tracing for this API. It should always only collect in-memory objects and not make remote calls. There's no real value in tracing this. You can add a code comment explaining why it is not consistent with the rest of the code. We know that the biggest problem when working with traces today is the sheer amount of things in it.

ok, tracing removed

core/trino-main/src/main/java/io/trino/tracing/TracingMetadata.java

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

raunaqmorarka · 2025-10-16T10:15:39Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/CachingHiveMetastoreModule.java

    {
        // cross TX metastore cache is enabled wrapper with caching metastore
-        return sharedHiveMetastoreCache.createCachingHiveMetastoreFactory(metastoreFactory);
+        return sharedHiveMetastoreCache.createCachingHiveMetastoreFactory(new MeasuredMetastoreFactory(metastoreFactory));


Will cached metastore calls show up in the metrics ?
Also, please check that the metrics tracking is working for Glue metastore as well, the way that is setup is a but different.

No, cached calls are not measured.
I will check the glue

It works for

glue_20251016_184141_00003_txnrc.json

glue

"activeCatalogs": [ { "name": "hive", "version": "95607a88f6655db4de4dbfdc0c255acf246148f0550ddf891c903b8aaea7f7b6", "connectorName": "hive", "properties": { "hive.metastore": "glue", "hive.security": "allow-all", "hive.metastore.glue.default-warehouse-dir": "local:///glue", "hive.max-partitions-per-scan": "1000", "fs.hadoop.enabled": "true", "hive.max-partitions-for-eager-load": "1000" } } ],

"catalogMetadataMetrics": { "hive": { "metastore.getPartitionNamesByFilter.time.distribution": { "@class": "io.trino.plugin.base.metrics.DistributionSnapshot", "total": 1, "min": 1600526792, "max": 1600526792, "p01": 1600526792, "p05": 1600526792, "p10": 1600526792, "p25": 1600526792, "p50": 1600526792, "p75": 1600526792, "p90": 1600526792, "p95": 1600526792, "p99": 1600526792 }, "metastore.getTable.time.total": { "@class": "io.trino.plugin.base.metrics.LongCount", "total": 210967292 }, "metastore.all.time.distribution": { "@class": "io.trino.plugin.base.metrics.DistributionSnapshot", "total": 15, "min": 169969959, "max": 1600526792, "p01": 169969959, "p05": 169969959, "p10": 176701709, "p25": 194937458, "p50": 206969875, "p75": 214762042, "p90": 362961542, "p95": 1600526792, "p99": 1600526792 }, "metastore.all.time.total": { "@class": "io.trino.plugin.base.metrics.LongCount", "total": 4566443044 }, "metastore.getTable.time.distribution": { "@class": "io.trino.plugin.base.metrics.DistributionSnapshot", "total": 1, "min": 210967292, "max": 210967292, "p01": 210967292, "p05": 210967292, "p10": 210967292, "p25": 210967292, "p50": 210967292, "p75": 210967292, "p90": 210967292, "p95": 210967292, "p99": 210967292 }, "metastore.getPartitionNamesByFilter.time.total": { "@class": "io.trino.plugin.base.metrics.LongCount", "total": 1600526792 }, "metastore.getPartitionsByNames.time.total": { "@class": "io.trino.plugin.base.metrics.LongCount", "total": 2754948960 }, "metastore.getPartitionsByNames.time.distribution": { "@class": "io.trino.plugin.base.metrics.DistributionSnapshot", "total": 13, "min": 169969959, "max": 362961542, "p01": 169969959, "p05": 169969959, "p10": 176701709, "p25": 194937458, "p50": 203406792, "p75": 210784958, "p90": 219027333, "p95": 362961542, "p99": 362961542 } } },

plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseMetadata.java

raunaqmorarka · 2025-10-16T10:31:22Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java

+    @Override
+    public Metrics getMetrics()
+    {
+        return metastore.getMetrics();


I think we would at least want tracking in REST and Glue catalogs as well

The goal of this PR is to add Hive metastore metrics, as this is often the source of perf problems.
Support for REST and glue catalogs can be added as a follow-up.

raunaqmorarka · 2025-10-16T10:46:44Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+    @Override
+    public io.trino.spi.metrics.Metrics getMetrics(ConnectorSession session)
+    {
+        return catalog.getMetrics();


This approach is only catching the metadata calls made to the metastore, which is relatively few operations in iceberg. The bulk of metadata fetching in iceberg is happening through the use of iceberg APIs in this class. You'll have to register InMemoryMetricsReporter using org.apache.iceberg.Scan#metricsReporter and collect those metrics like in IcebergSplitSource.
That will still not catch the metadata calls in commit operations, but I don't see a way in iceberg APIs to register metrics reporter for commit operations. Maybe something to improve in upstream iceberg.
fyi @ebyhr @findepi

Yes, for now, I'm only adding support to metastore metrics. This could be later extended.

The goal is to expose in QueryStats, per catalog, connector-specific metrics like metastore api call stats.

lukasz-stec

comments addressed

lukasz-stec · 2025-10-16T18:58:17Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java

+    @Override
+    public Metrics getMetrics()
+    {
+        return metastore.getMetrics();


The goal of this PR is to add Hive metastore metrics, as this is often the source of perf problems.
Support for REST and glue catalogs can be added as a follow-up.

...tests/src/test/java/io/trino/connector/informationschema/TestInformationSchemaConnector.java

lukasz-stec · 2025-10-16T18:58:55Z

core/trino-main/src/main/java/io/trino/tracing/TracingConnectorMetadata.java

+    @Override
+    public Metrics getMetrics(ConnectorSession session)
+    {
+        Span span = startSpan("getMetrics");


ok, tracing removed

core/trino-main/src/main/java/io/trino/tracing/TracingMetadata.java

cla-bot bot added the cla-signed label Oct 9, 2025

lukasz-stec requested review from findepi and raunaqmorarka October 9, 2025 15:32

sourcery-ai bot reviewed Oct 9, 2025

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Show resolved Hide resolved

github-actions bot added iceberg Iceberg connector hive Hive connector labels Oct 9, 2025

lukasz-stec marked this pull request as draft October 10, 2025 07:06

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 57c65e9 to 2b8cd5c Compare October 10, 2025 08:07

github-actions bot added the lakehouse label Oct 10, 2025

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 2b8cd5c to 0c78e13 Compare October 10, 2025 08:58

github-actions bot added the jdbc Relates to Trino JDBC driver label Oct 10, 2025

Move DistributionSnapshot to trino-plugin-toolkit

b08eb4d

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch 2 times, most recently from a120650 to 75ab2b8 Compare October 10, 2025 12:26

lukasz-stec marked this pull request as ready for review October 10, 2025 15:10

sourcery-ai bot reviewed Oct 10, 2025

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java Show resolved Hide resolved

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java Show resolved Hide resolved

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 75ab2b8 to 15f1657 Compare October 13, 2025 13:38

findepi reviewed Oct 13, 2025

View reviewed changes

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 15f1657 to c3d6dcb Compare October 13, 2025 14:55

Use the same TransactionManager

82c9d70

Metadata and QueryStateMachine must use the same `TransactionManager` instance.

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from c3d6dcb to 8be4339 Compare October 13, 2025 15:50

lukasz-stec commented Oct 13, 2025

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java Outdated Show resolved Hide resolved

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 8be4339 to 879926b Compare October 13, 2025 18:19

lukasz-stec commented Oct 13, 2025

View reviewed changes

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 879926b to 19f603d Compare October 13, 2025 20:58

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch 3 times, most recently from a4aa981 to a5d9fe1 Compare October 14, 2025 15:19

lukasz-stec requested a review from findepi October 14, 2025 19:03

findepi approved these changes Oct 15, 2025

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java Outdated Show resolved Hide resolved

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch 2 times, most recently from c0fddfe to 10d41ae Compare October 15, 2025 14:21

raunaqmorarka reviewed Oct 16, 2025

View reviewed changes

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 72d6d9e to 823da64 Compare October 16, 2025 11:54

lukasz-stec added 4 commits October 16, 2025 20:55

Add Metadata.getMetrics

af82bcf

The goal is to expose in QueryStats, per catalog, connector-specific metrics like metastore api call stats.

Add metastore metrics to HiveMetadata.getMetrics

9fd6e69

Add metastore metrics to IcebergMetadata.getMetrics

e2f4ff5

Add metastore metrics to LakehouseMetadata.getMetrics

107045c

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 823da64 to 107045c Compare October 16, 2025 18:59

lukasz-stec commented Oct 16, 2025

View reviewed changes

Add per catalog metastore metrics to QueryStats #26900

Are you sure you want to change the base?

Add per catalog metastore metrics to QueryStats #26900

Uh oh!

Conversation

lukasz-stec commented Oct 9, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for MeasuredHiveMetastore and metastore metrics integration

Class diagram for QueryStats and catalogMetadataMetrics field

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukasz-stec commented Oct 10, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukasz-stec commented Oct 13, 2025

Uh oh!

findepi commented Oct 13, 2025

Uh oh!

lukasz-stec commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasz-stec commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasz-stec commented Oct 9, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 9, 2025 •

edited

Loading