fix: Coordinator memory, validate only coordinator heap, use worker heap capacity#27162
Open
patdevinwilson wants to merge 6 commits intoprestodb:masterfrom
Open
Conversation
…ised capacity for limits - LocalMemoryManager: coordinator-only validation when include-coordinator=false (only heap headroom validated; no reserved pool) - LocalMemoryManagerProvider: wire coordinator-only path in ServerMainModule - MemoryManagerConfig: query.use-worker-advertised-memory-for-limit (default true) - ClusterMemoryManager: cap query limits by sum of worker general pool capacity - Tests and admin docs for new config and validation
Contributor
Reviewer's GuideImplements coordinator-only memory validation and worker-advertised memory caps so that a non-scheduling coordinator can run with a smaller heap while still enforcing safe query limits based on worker capacity. Sequence diagram for worker-advertised memory limits during query processingsequenceDiagram
actor User
participant Coordinator
participant ClusterMemoryManager
participant Worker1
participant Worker2
participant GeneralPool as GeneralPool_cluster
User->>Coordinator: submitQuery()
Coordinator->>ClusterMemoryManager: registerQuery(query)
loop periodicMemoryUpdates
Worker1->>ClusterMemoryManager: sendMemoryInfo(generalPoolMaxBytes_1)
Worker2->>ClusterMemoryManager: sendMemoryInfo(generalPoolMaxBytes_2)
ClusterMemoryManager->>GeneralPool: updateTotalDistributedBytes()
end
loop memoryManagementCycle
Coordinator->>ClusterMemoryManager: process(runningQueries)
ClusterMemoryManager->>ClusterMemoryManager: readConfigFlags(isWorkScheduledOnCoordinator,useWorkerAdvertisedMemoryForLimit)
alt coordinatorDoesNotScheduleWork_and_flagTrue
ClusterMemoryManager->>GeneralPool: getTotalDistributedBytes()
GeneralPool-->>ClusterMemoryManager: workerTotalCapacity
ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryTotalMemoryInBytes = min(configuredMaxQueryTotalMemoryInBytes,workerTotalCapacity)
ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryMemoryInBytes = min(configuredMaxQueryMemoryInBytes,effectiveMaxQueryTotalMemoryInBytes)
else flagFalse_or_coordinatorSchedulesWork
ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryMemoryInBytes = configuredMaxQueryMemoryInBytes
ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryTotalMemoryInBytes = configuredMaxQueryTotalMemoryInBytes
end
ClusterMemoryManager->>ClusterMemoryManager: userMemoryLimit = min(effectiveMaxQueryMemoryInBytes,sessionQueryMaxMemory)
ClusterMemoryManager->>ClusterMemoryManager: totalMemoryLimit = min(effectiveMaxQueryTotalMemoryInBytes,otherLimits)
ClusterMemoryManager-->>Coordinator: enforceLimits_or_failQuery()
end
Class diagram for updated memory management componentsclassDiagram
class LocalMemoryManager {
+DataSize maxMemory
+Map_pools
+LocalMemoryManager(NodeMemoryConfig_config)
+LocalMemoryManager(NodeMemoryConfig_config,long_availableMemory)
+LocalMemoryManager(NodeMemoryConfig_config,long_availableMemory,boolean_useCoordinatorOnlyValidation)
-configureMemoryPools(NodeMemoryConfig_config,long_availableMemory,boolean_useCoordinatorOnlyValidation)
+MemoryInfo getInfo()
+static void validateHeapHeadroom(NodeMemoryConfig_config,long_availableMemory)
+static void validateCoordinatorHeapHeadroom(NodeMemoryConfig_config,long_availableMemory)
}
class LocalMemoryManagerProvider {
-NodeMemoryConfig nodeMemoryConfig
-ServerConfig serverConfig
-NodeSchedulerConfig nodeSchedulerConfig
+LocalMemoryManagerProvider(NodeMemoryConfig_nodeMemoryConfig,ServerConfig_serverConfig,NodeSchedulerConfig_nodeSchedulerConfig)
+LocalMemoryManager get()
}
class ClusterMemoryManager {
-boolean isWorkScheduledOnCoordinator
-boolean isBinaryTransportEnabled
-boolean useWorkerAdvertisedMemoryForLimit
-Map_pools
-long maxQueryMemoryInBytes
-long maxQueryTotalMemoryInBytes
+ClusterMemoryManager(MemoryManagerConfig_config,NodeSchedulerConfig_schedulerConfig,ServerConfig_serverConfig,QueryManagerConfig_queryManagerConfig,MemoryManagerConfig_memoryManagerConfig,FeaturesConfig_featuresConfig,NodeTaskMap_nodeTaskMap,MemoryPool_assigner,QueryIdGenerator_queryIdGenerator)
+void process(Iterable_runningQueries)
}
class MemoryManagerConfig {
-String lowMemoryKillerPolicy
-Duration killOnOutOfMemoryDelay
-boolean tableFinishOperatorMemoryTrackingEnabled
-boolean useWorkerAdvertisedMemoryForLimit
+boolean isUseWorkerAdvertisedMemoryForLimit()
+MemoryManagerConfig setUseWorkerAdvertisedMemoryForLimit(boolean_useWorkerAdvertisedMemoryForLimit)
}
class NodeMemoryConfig
class ServerConfig {
+boolean isCoordinator()
}
class NodeSchedulerConfig {
+boolean isIncludeCoordinator()
}
LocalMemoryManagerProvider ..> LocalMemoryManager : creates
LocalMemoryManagerProvider --> NodeMemoryConfig : uses
LocalMemoryManagerProvider --> ServerConfig : uses
LocalMemoryManagerProvider --> NodeSchedulerConfig : uses
ClusterMemoryManager --> MemoryManagerConfig : reads_limits
ClusterMemoryManager --> NodeSchedulerConfig : reads_includeCoordinator
ServerConfig ..> ClusterMemoryManager
NodeMemoryConfig ..> LocalMemoryManager
MemoryManagerConfig ..> ClusterMemoryManager
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
LocalMemoryManager(NodeMemoryConfig, long, boolean)constructor is annotated@VisibleForTestingbut is now used in production viaLocalMemoryManagerProvider; either remove the annotation or introduce a separate production-facing factory to avoid misleading the intent. - In
ClusterMemoryManager.process, when worker-advertised capacity reduceseffectiveMaxQueryMemoryInBytes/effectiveMaxQueryTotalMemoryInBytes, consider emitting a debug log with the capped values and worker capacity to aid in diagnosing cluster-wide memory limit behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `LocalMemoryManager(NodeMemoryConfig, long, boolean)` constructor is annotated `@VisibleForTesting` but is now used in production via `LocalMemoryManagerProvider`; either remove the annotation or introduce a separate production-facing factory to avoid misleading the intent.
- In `ClusterMemoryManager.process`, when worker-advertised capacity reduces `effectiveMaxQueryMemoryInBytes` / `effectiveMaxQueryTotalMemoryInBytes`, consider emitting a debug log with the capped values and worker capacity to aid in diagnosing cluster-wide memory limit behavior.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/test/java/com/facebook/presto/memory/TestMemoryManagerConfig.java:59-62` </location>
<code_context>
.put("query.max-total-memory", "3GB")
.put("query.soft-max-total-memory", "2GB")
.put("table-finish-operator-memory-tracking-enabled", "true")
+ .put("query.use-worker-advertised-memory-for-limit", "false")
.build();
</code_context>
<issue_to_address>
**issue (testing):** Missing behavioral tests for worker-advertised capacity capping of query limits in ClusterMemoryManager.
Config-level coverage is good, but we still lack tests that exercise this behavior in `ClusterMemoryManager`. Please add tests (in the existing `ClusterMemoryManager` test suite) for at least: (1) `useWorkerAdvertisedMemoryForLimit = true` with worker capacity smaller than configured limits, asserting effective user/total limits are capped; (2) capacity larger than configured limits, asserting configured limits still apply; and (3) `useWorkerAdvertisedMemoryForLimit = false` or coordinator scheduling work, asserting behavior is unchanged. This will verify the new flag and config interaction with cluster-level memory enforcement end‑to‑end.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/test/java/com/facebook/presto/memory/TestMemoryManagerConfig.java
Show resolved
Hide resolved
Contributor
|
Please sign the Presto CLA as mentioned in this comment. Thanks! |
steveburnett
requested changes
Feb 18, 2026
Contributor
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the documentation! Looks good, just a nit of phrasing.
Co-authored-by: Steve Burnett <burnett@pobox.com>
steveburnett
previously approved these changes
Feb 19, 2026
Contributor
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
b6a60e1 to
e4471e9
Compare
…-advertised capacity for limits
- LazyOutputBuffer: no-op when delegate is null and state is terminal to avoid IllegalStateException 'Buffer has not been initialized' on teardown/races - TestMetadata.testShowTables: use information_schema.tables instead of SHOW TABLES LIKE so expected (Java) query runs reliably in native-vs-java tests
- Listener: return early when task revocable memory <= threshold (don't schedule) - Visitor: only request revocation when !isMemoryRevokingRequested() to avoid stale revoking-requested flags (fixes TestMemoryRevokingScheduler.testTaskThresholdRevokingSchedulerImmediate) Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR improves how the coordinator handles memory when it does not run tasks (
node-scheduler.include-coordinator=false):Coordinator-only memory validation
The coordinator no longer validates that
query.max-memory-per-node/query.max-total-memory-per-nodefit in its own JVM heap. It only checks thatmemory.heap-headroom-per-nodefits and sizes a single general pool to (heap − headroom). Workers still enforce the full per-node limits. This allows a small-heap coordinator to start with the same config as large-heap workers.Worker-advertised capacity for query limits
When the coordinator does not schedule work, it can cap query memory limits using the sum of workers' advertised general pool capacity. Effective limits become
min(configured query.max-memory / query.max-total-memory, sum of worker capacities). This is controlled byquery.use-worker-advertised-memory-for-limit(defaulttrue).Changes:
useCoordinatorOnlyValidation; when true, only heap headroom is validated and only the general pool is created (no reserved pool).LocalMemoryManagerwith coordinator-only validation whenserverConfig.isCoordinator() && !nodeSchedulerConfig.isIncludeCoordinator().LocalMemoryManagerviaLocalMemoryManagerProvider.query.use-worker-advertised-memory-for-limit(defaulttrue).maxBytes(fromMemoryInfo).Motivation and Context
Today, the coordinator runs the same
LocalMemoryManagervalidation as workers, soquery.max-memory-per-node(and thusquery.max-total-memory-per-node) must fit in the coordinator's heap. Withnode-scheduler.include-coordinator=false, the coordinator does not run tasks but still had to pass that check, forcing the same per-node value for the whole cluster and blocking small-heap coordinators when workers use larger limits.A better design is: the coordinator only validates its own (small) heap, and workers advertise capacity; the coordinator uses worker-advertised capacity (capped by config) for scheduling and OOM decisions. This PR implements that.
Impact
query.use-worker-advertised-memory-for-limit(boolean, defaulttrue). Documented in admin properties.node-scheduler.include-coordinator=false, the coordinator can start with largequery.max-memory-per-node/query.max-total-memory-per-node(for workers) as long asmemory.heap-headroom-per-nodefits in its heap. When the new config is true, effective query limits are capped by the sum of worker general pool capacity.MemoryInfoalready gathered for pool updates.Test Plan
TestLocalMemoryManager– coordinator-only path allows large per-node config with small heap and fails when headroom ≥ heap.TestNodeMemoryConfig–validateCoordinatorHeapHeadroompasses/fails as expected.TestMemoryManagerConfig– default and explicit mapping forquery.use-worker-advertised-memory-for-limit.node-scheduler.include-coordinator=false, small heap, and worker-sizedquery.max-memory-per-nodestarts successfully; with workers up, query limits are effectively capped by worker-advertised capacity when the new config is true.Contributor checklist
query.use-worker-advertised-memory-for-limit(defaulttrue) in admin properties.Release Notes
Summary by Sourcery
Adjust coordinator memory handling to support small-heap coordinators that do not schedule work, and cap query memory limits using worker-advertised capacity.
New Features:
Enhancements:
Documentation:
Tests: