scheduler: wait for inflight queries before shutting down

### Context

When shutting down query-schedulers the query-frontends log these errors 

```
ts=2025-09-04T03:40:59.450600734Z caller=handler.go:430 level=info msg="query stats" component=query-frontend method=POST path=/prometheus/api/v1/query route_name=prometheus_api_v1_query user_agent=Go-http-client/1.1 status_code=499 response_time=17.158379ms response_size_bytes=0 query_wall_time_seconds=0.010860358 fetched_series_count=0 fetched_chunk_bytes=0 fetched_chunks_count=0 fetched_index_bytes=0 sharded_queries=0 split_queries=0 spun_off_subqueries=0 estimated_series_count=0 queue_time_seconds=3.2402e-05 encode_time_seconds=0 samples_processed=0 samples_processed_cache_adjusted=0 param_query="<redacted>" param_time=2025-09-04T03:40:50Z length=34m59.999s time_since_min_time=35m9.432405588s time_since_max_time=9.433405588s results_cache_hit_bytes=0 results_cache_miss_bytes=0 
status=failed 
err="context canceled: query cancelled: rpc error: code = Canceled desc = context canceled: frontend disconnected"
```

this is confusing because the client didn't cancel the query and it observed the HTTP 499 error. For example this is grafana alerting reporting the error

```
2025-09-04 14:40:15.816,"[sse.dataQueryError] failed to execute query [A]: unexpected response with status code 499: {""status"":""error"",""errorType"":""canceled"",""error"":""context canceled: query cancelled: rpc error: code = Canceled desc = context canceled: frontend disconnected""}"
```



### Problem

Today the query-scheduler shuts down before waiting for active queries to finish. While it waits for all inflight queries to be flushed and for all querier workers to disconnect, it also immediately closes the connections to the query-frontend, which in turn cancels all in-flight queries from frontends. 


https://github.com/grafana/mimir/blob/51cc6612304e689743b6b536e6f1512dee9e10a9/pkg/scheduler/scheduler.go#L249-L250


#### Shutdown

There are two places which are responsible for shutting down:

A. where the scheduler communicates to the queriers that the frontend has disconnected and they can discard their queries
https://github.com/grafana/mimir/blob/51cc6612304e689743b6b536e6f1512dee9e10a9/pkg/scheduler/scheduler.go#L240


B. where the frontend closes the loop with the scheduler (which will in turn trigger 1.)

https://github.com/grafana/mimir/blob/699a122aba83da75ccef884d63af468da4f5127e/pkg/frontend/v2/frontend_scheduler_worker.go#L322-L325


This is a correlation between the scheduler shutting down and the query-frontend rejecting queries with HTTP 499 

<img width="1686" height="591" alt="Image" src="https://github.com/user-attachments/assets/d4982e0a-c5b9-4adc-9553-e3885fc72208" />


### Proposal 

We should change both A. and B. above.

A: Scheduler: Cancel the queries context and close the gRPC stream only after we know that all queries have been answered and/or all querier workers have disconnected.

B: Frontend: stop sending new queries to the same scheduler (e.g. don't read from `requestsCh`) but do not disconnect from the scheduler (so that the frontend can still send e.g. cancellation notices via the scheduler to the queriers)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scheduler: wait for inflight queries before shutting down #12605

Context

Problem

Shutdown

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// We stop accepting new queries in Stopping state. By returning quickly, we disconnect frontends, which in turns
	// cancels all their queries.

	loopErr = w.schedulerLoop(loop)
	if closeErr := util.CloseAndExhaust[*schedulerpb.SchedulerToFrontend](loop); closeErr != nil {
	level.Debug(w.log).Log("msg", "failed to close frontend loop", "err", closeErr, "addr", w.schedulerAddr)
	}

scheduler: wait for inflight queries before shutting down #12605

Description

Context

Problem

Shutdown

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions