Skip to content

Change RemoteEndpoints interface to support remote engines pruning#680

Open
SuperPaintman wants to merge 7 commits intothanos-io:mainfrom
SuperPaintman:query-distributed-mode-perf-improvements
Open

Change RemoteEndpoints interface to support remote engines pruning#680
SuperPaintman wants to merge 7 commits intothanos-io:mainfrom
SuperPaintman:query-distributed-mode-perf-improvements

Conversation

@SuperPaintman
Copy link
Copy Markdown

@SuperPaintman SuperPaintman commented Jan 6, 2026

This PR changes the RemoteEndpoints interface. Now it accepts 'hints' from the caller when selecting remote engines. These hints will be used to prune TSDBInfos (as described in this thanos-io/thanos#8599).

Notes

Note 1:

I added two interfaces: RemoteEndpointsV2 (as suggested here thanos-io/thanos#8599 (comment)), and RemoteEndpointsV3 which can be extended without making any new breaking changes.

Let me know which one you prefer, and I'll remove the other one.

Verification

--

Issue: thanos-io/thanos#8597

CC: @MichaHoffmann

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Comment thread api/remote.go Outdated
Comment thread api/remote.go
}

func (m staticEndpoints) EnginesV2(mint, maxt int64) []RemoteEngine {
return m.engines
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply filtering here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but since pruning is optional, I decided to not change DistributedExecutionOptimizer implementation, because it assumes that some engines might not have the requested range and fall back.

Some some unit tests check for that, and if we apply filter in static endpoints, they fail.

But now when I'm thinking about that again, I'm not sure that's an issue anymore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, the API contract suggest that we only should return intersecting engines.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting something like this?

func (m staticEndpoints) Engines(mint, maxt int64) []RemoteEngine {
	var engines []RemoteEngine
	for _, e := range m.engines {
		if e.MaxT() < mint || e.MinT() > maxt {
			continue
		}
		engines = append(engines, e)
	}
	return engines
}

Copy link
Copy Markdown
Author

@SuperPaintman SuperPaintman Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I'm a bit hesitant about filtering engines themselves here (Maybe I don't fully understand the original implementation).

The optimizers seem to expect all engines to be available, especially for handling absent() (in DistributedExecutionOptimizer.distributeAbsent). If the engine is removed from query (e.g. if it has a data gap for the requested range), dist engine will return incorrect results.

Also, filtering engines breaks TestDistributedExecutionWithLongSelectorRanges/skip_distributing_queries_with_timestamps_outside_of_the_range_of_an_engine:

--- Expected
+++ Actual
@@ -1 +1 @@
-sum(sum by (region) (metric @ 18000.000))
+sum(dedup(remote(sum by (region) (metric @ 18000.000))))

Let me know if I'm mistaken, but it seems like an invalid result.

The current "real" implementation also assumes all engines are return.

The intent of the mint/maxt is to prune internal metadata (like TSDBInfos) within each engine, to reduce unnecessary computations later.

Does that make sense, or am I missing something?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, this indeed makes sense, especially since you brought up absent. I think we can leave this as it is and optimize later.

@SuperPaintman SuperPaintman changed the title Add RemoteEndpointsV2/V3 that requests pruned remote engines Add RemoteEndpointsV2/V3 that returns pruned remote engines Jan 6, 2026
@fpetkovski
Copy link
Copy Markdown
Collaborator

I think the contention is between having API parameters vs hints. Our experience with Prometheus hints hasn't been too positive since they complicate everything downstream. Each change subsequent has to take into account that the hint might not be enforced. I would vote for the V2 implementation which treats parameters as part of a hard API. I would also be fine with adding a new method to the interface instead of introducing a new one.

Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
Signed-off-by: Aleksandr Krivoshchekov <SuperPaintmanDeveloper@gmail.com>
@SuperPaintman
Copy link
Copy Markdown
Author

SuperPaintman commented Jan 21, 2026

I've added a few unit tests for the caching behavior in this PR, but I think the main test suite and benchmarking should happen in the thanos repo itself, where we can test the actual TSDBInfos pruning.

Also, I removed RemoteEndpointsV2/V3 to simplify the transition to the new API (ultimately, we will have to make breaking changes anyway if we don't want to support two interfaces), and eliminate ambiguity.

So now this PR introduces a breaking change to the RemoteEndpoints API. @MichaHoffmann mentioned that the Thanos maintainers are okay with breaking changes like this one. I'll send a separate one-line PR to thanos repo to update the "real" implementation before this PR is merged (the mint/maxt will be ignored for now).

@SuperPaintman SuperPaintman changed the title Add RemoteEndpointsV2/V3 that returns pruned remote engines Change RemoteEndpoints interface to support remote engines pruning Jan 21, 2026
Copy link
Copy Markdown
Collaborator

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me, still has to be tested through Thanos but I am okay with merging the change.

Comment thread api/remote.go
}

func (m staticEndpoints) EnginesV2(mint, maxt int64) []RemoteEngine {
return m.engines
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, this indeed makes sense, especially since you brought up absent. I think we can leave this as it is and optimize later.

@SuperPaintman
Copy link
Copy Markdown
Author

Prepared a small PR to Thanos, where we only update the function signature: thanos-io/thanos#8653

@fpetkovski / @MichaHoffmann could you merge this PR. It seems I don't have permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants