Skip to content

scheduler: fragment queue and querier pick-up coordination #6968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rubywtl
Copy link
Contributor

@rubywtl rubywtl commented Aug 13, 2025

What this PR does:
This PR introduces a Fragmenter interface that splits logical query plans into fragments when distributed execution is enabled. The Fragmenter appends metadata to each fragment for tracking, which the scheduler then uses to route fragments to appropriate queriers. The scheduler maintains a mapping between fragments and querier addresses to track fragment locations across the distributed system.

Which issue(s) this PR fixes:

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -414,7 +416,12 @@ func (t *Cortex) initQuerier() (serv services.Service, err error) {

t.Cfg.Worker.MaxConcurrentRequests = t.Cfg.Querier.MaxConcurrent
t.Cfg.Worker.TargetHeaders = t.Cfg.API.HTTPRequestHeadersToLog
return querier_worker.NewQuerierWorker(t.Cfg.Worker, httpgrpc_server.NewServer(internalQuerierRouter), util_log.Logger, prometheus.DefaultRegisterer)
ipAddr, err := ring.GetInstanceAddr(t.Cfg.Alertmanager.ShardingRing.InstanceAddr, t.Cfg.Alertmanager.ShardingRing.InstanceInterfaceNames, util_log.Logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using alertmanager config here

@@ -0,0 +1,21 @@
package distributed_execution

type FragmentKey struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments for public exposed types


type FragmentKey struct {
queryID uint64
fragmentID uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a big deal... But fragment ID doesn't need to be a uint64 type. We don't expect to have that many fragments as it is already scoped with query ID

fragmentID uint64
}

func MakeFragmentKey(queryID uint64, fragmentID uint64) *FragmentKey {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to return a pointer for this I think.


type Fragmenter interface {
Fragment(node logicalplan.Node) ([]Fragment, error)
getNewID() uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be part of the interface? It is very weird to have 1 method in the interface to be public and another one is private.
I would remove this from the interface

}

func (f *DummyFragmenter) getNewID() uint64 {
return 1 // for dummy plan_fragments testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is just for testing, you can just hardcode to 1 in the Fragment function

f.mappings[*key] = addr
}

func (f *FragmentTable) GetMappings(queryID uint64, fragmentIDs []uint64) ([]string, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a more descriptive name? It took me a while to understand what mapping it is. If it is getting child querier addresses we find a better name

defer f.mu.Unlock()

keysToDelete := make([]distributed_execution.FragmentKey, 0)
for key := range f.mappings {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the methods you have, is it easier to change mappings from mappings map[distributed_execution.FragmentKey]string to map[uint64]map[uint64]string?

You can find the map by just a lookup


import "github.com/thanos-io/promql-engine/logicalplan"

type Fragmenter interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to move the Fragmenter to distributed_execution as fragmentation is specific to remote distribution.

The fragment table can be just moved to scheduler folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants