Change `RemoteEndpoints` interface to support remote engines pruning by SuperPaintman · Pull Request #680 · thanos-io/promql-engine

SuperPaintman · 2026-01-06T07:02:25Z

This PR changes the RemoteEndpoints interface. Now it accepts 'hints' from the caller when selecting remote engines. These hints will be used to prune TSDBInfos (as described in this thanos-io/thanos#8599).

Notes

Note 1:

~~I added two interfaces: RemoteEndpointsV2 (as suggested here thanos-io/thanos#8599 (comment)), and RemoteEndpointsV3 which can be extended without making any new breaking changes.~~

~~Let me know which one you prefer, and I'll remove the other one.~~

Verification

Unit tests.
- Let's discuss the interface / implementation first. Once that's settled, I'll add unit tests.
Benchmark
- I'm going to add a benchmark for dist-engine optimizers with synthetic data (as described in the main issue: query: distributed query mode is too slow when the number of external labels is high (~1kk) thanos#8597).
- I think they should be added in the thanos repo.

--

Issue: thanos-io/thanos#8597

CC: @MichaHoffmann

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

fpetkovski · 2026-01-06T07:32:09Z

 }

+func (m staticEndpoints) EnginesV2(mint, maxt int64) []RemoteEngine {
+	return m.engines


Should we apply filtering here?

I tried, but since pruning is optional, I decided to not change DistributedExecutionOptimizer implementation, because it assumes that some engines might not have the requested range and fall back.

Some some unit tests check for that, and if we apply filter in static endpoints, they fail.

But now when I'm thinking about that again, I'm not sure that's an issue anymore.

I think yes, the API contract suggest that we only should return intersecting engines.

Are you suggesting something like this?

func (m staticEndpoints) Engines(mint, maxt int64) []RemoteEngine { var engines []RemoteEngine for _, e := range m.engines { if e.MaxT() < mint || e.MinT() > maxt { continue } engines = append(engines, e) } return engines }

If so, I'm a bit hesitant about filtering engines themselves here (Maybe I don't fully understand the original implementation).

The optimizers seem to expect all engines to be available, especially for handling absent() (in DistributedExecutionOptimizer.distributeAbsent). If the engine is removed from query (e.g. if it has a data gap for the requested range), dist engine will return incorrect results.

Also, filtering engines breaks TestDistributedExecutionWithLongSelectorRanges/skip_distributing_queries_with_timestamps_outside_of_the_range_of_an_engine:

--- Expected +++ Actual @@ -1 +1 @@ -sum(sum by (region) (metric @ 18000.000)) +sum(dedup(remote(sum by (region) (metric @ 18000.000))))

Let me know if I'm mistaken, but it seems like an invalid result.

The current "real" implementation also assumes all engines are return.

The intent of the mint/maxt is to prune internal metadata (like TSDBInfos) within each engine, to reduce unnecessary computations later.

Does that make sense, or am I missing something?

Gotcha, this indeed makes sense, especially since you brought up absent. I think we can leave this as it is and optimize later.

fpetkovski · 2026-01-06T08:32:40Z

I think the contention is between having API parameters vs hints. Our experience with Prometheus hints hasn't been too positive since they complicate everything downstream. Each change subsequent has to take into account that the hint might not be enforced. I would vote for the V2 implementation which treats parameters as part of a hard API. I would also be fine with adding a new method to the interface instead of introducing a new one.

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

SuperPaintman · 2026-01-21T07:05:49Z

I've added a few unit tests for the caching behavior in this PR, but I think the main test suite and benchmarking should happen in the thanos repo itself, where we can test the actual TSDBInfos pruning.

Also, I removed RemoteEndpointsV2/V3 to simplify the transition to the new API (ultimately, we will have to make breaking changes anyway if we don't want to support two interfaces), and eliminate ambiguity.

So now this PR introduces a breaking change to the RemoteEndpoints API. @MichaHoffmann mentioned that the Thanos maintainers are okay with breaking changes like this one. I'll send a separate one-line PR to thanos repo to update the "real" implementation before this PR is merged (the mint/maxt will be ignored for now).

fpetkovski

This seems good to me, still has to be tested through Thanos but I am okay with merging the change.

fpetkovski · 2026-01-29T06:05:44Z

 }

+func (m staticEndpoints) EnginesV2(mint, maxt int64) []RemoteEngine {
+	return m.engines


Gotcha, this indeed makes sense, especially since you brought up absent. I think we can leave this as it is and optimize later.

SuperPaintman · 2026-01-30T06:42:08Z

Prepared a small PR to Thanos, where we only update the function signature: thanos-io/thanos#8653

@fpetkovski / @MichaHoffmann could you merge this PR. It seems I don't have permissions.

Add RemoteEndpointsV2/V3 that requests pruned remote engines

63ff685

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

SuperPaintman force-pushed the query-distributed-mode-perf-improvements branch from d1885ac to 63ff685 Compare January 6, 2026 07:03

This was referenced Jan 6, 2026

WIP: query: prune TSDBInfos in query.remoteEndpoints.Engines() thanos-io/thanos#8599

Draft

query: cache engines in remoteEndpoints to reuse computed MinT / MaxT / LabelSets values across Engines() calls thanos-io/thanos#8598

Open

fpetkovski reviewed Jan 6, 2026

View reviewed changes

SuperPaintman mentioned this pull request Jan 6, 2026

query: distributed query mode is too slow when the number of external labels is high (~1kk) thanos-io/thanos#8597

Open

SuperPaintman changed the title ~~Add RemoteEndpointsV2/V3 that requests pruned remote engines~~ Add RemoteEndpointsV2/V3 that returns pruned remote engines Jan 6, 2026

SuperPaintman added 6 commits January 21, 2026 02:51

Remove RemoteEndpointsV3

ba2303e

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

Fix default mint/maxt

b51950a

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

Add unit tests for CachedEndpoints

75d41f3

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

Replace RemoteEndpoints with RemoteEndpointsV2

808c854

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

Simplify

8840588

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

Update doc comments

e975db8

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>

SuperPaintman requested review from MichaHoffmann and fpetkovski January 21, 2026 07:07

SuperPaintman changed the title ~~Add RemoteEndpointsV2/V3 that returns pruned remote engines~~ Change RemoteEndpoints interface to support remote engines pruning Jan 21, 2026

fpetkovski approved these changes Jan 29, 2026

View reviewed changes

SuperPaintman mentioned this pull request Jan 30, 2026

query: prepare remoteEndpoints for remote engine pruning thanos-io/thanos#8653

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `RemoteEndpoints` interface to support remote engines pruning#680

Change `RemoteEndpoints` interface to support remote engines pruning#680
SuperPaintman wants to merge 7 commits intothanos-io:mainfrom
SuperPaintman:query-distributed-mode-perf-improvements

SuperPaintman commented Jan 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

fpetkovski Jan 6, 2026

Uh oh!

SuperPaintman Jan 6, 2026

Uh oh!

MichaHoffmann Jan 6, 2026

Uh oh!

SuperPaintman Jan 21, 2026

Uh oh!

SuperPaintman Jan 21, 2026 •

edited

Loading

Uh oh!

fpetkovski Jan 29, 2026

Uh oh!

fpetkovski commented Jan 6, 2026

Uh oh!

SuperPaintman commented Jan 21, 2026 •

edited

Loading

Uh oh!

fpetkovski left a comment

Uh oh!

fpetkovski Jan 29, 2026

Uh oh!

SuperPaintman commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SuperPaintman commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Verification

Uh oh!

Uh oh!

fpetkovski Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

SuperPaintman Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

MichaHoffmann Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

SuperPaintman Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SuperPaintman Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpetkovski Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

fpetkovski commented Jan 6, 2026

Uh oh!

SuperPaintman commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fpetkovski left a comment

Choose a reason for hiding this comment

Uh oh!

fpetkovski Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

SuperPaintman commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SuperPaintman commented Jan 6, 2026 •

edited

Loading

SuperPaintman Jan 21, 2026 •

edited

Loading

SuperPaintman commented Jan 21, 2026 •

edited

Loading