Skip to content

fix: Coordinator memory, validate only coordinator heap, use worker heap capacity#27162

Open
patdevinwilson wants to merge 6 commits intoprestodb:masterfrom
patdevinwilson:pwilson/coordinator-memory-worker-advertised-limit
Open

fix: Coordinator memory, validate only coordinator heap, use worker heap capacity#27162
patdevinwilson wants to merge 6 commits intoprestodb:masterfrom
patdevinwilson:pwilson/coordinator-memory-worker-advertised-limit

Conversation

@patdevinwilson
Copy link

@patdevinwilson patdevinwilson commented Feb 18, 2026

Description

This PR improves how the coordinator handles memory when it does not run tasks (node-scheduler.include-coordinator=false):

  1. Coordinator-only memory validation
    The coordinator no longer validates that query.max-memory-per-node / query.max-total-memory-per-node fit in its own JVM heap. It only checks that memory.heap-headroom-per-node fits and sizes a single general pool to (heap − headroom). Workers still enforce the full per-node limits. This allows a small-heap coordinator to start with the same config as large-heap workers.

  2. Worker-advertised capacity for query limits
    When the coordinator does not schedule work, it can cap query memory limits using the sum of workers' advertised general pool capacity. Effective limits become min(configured query.max-memory / query.max-total-memory, sum of worker capacities). This is controlled by query.use-worker-advertised-memory-for-limit (default true).

Changes:

  • LocalMemoryManager: New constructor with useCoordinatorOnlyValidation; when true, only heap headroom is validated and only the general pool is created (no reserved pool).
  • LocalMemoryManagerProvider: Builds LocalMemoryManager with coordinator-only validation when serverConfig.isCoordinator() && !nodeSchedulerConfig.isIncludeCoordinator().
  • ServerMainModule: Binds LocalMemoryManager via LocalMemoryManagerProvider.
  • MemoryManagerConfig: New query.use-worker-advertised-memory-for-limit (default true).
  • ClusterMemoryManager: When the flag is true and work is not scheduled on the coordinator, effective user and total query limits are capped by the sum of worker general pool maxBytes (from MemoryInfo).
  • Tests for coordinator-only validation and for the new config; admin docs updated for the new property.

Motivation and Context

Today, the coordinator runs the same LocalMemoryManager validation as workers, so query.max-memory-per-node (and thus query.max-total-memory-per-node) must fit in the coordinator's heap. With node-scheduler.include-coordinator=false, the coordinator does not run tasks but still had to pass that check, forcing the same per-node value for the whole cluster and blocking small-heap coordinators when workers use larger limits.

A better design is: the coordinator only validates its own (small) heap, and workers advertise capacity; the coordinator uses worker-advertised capacity (capped by config) for scheduling and OOM decisions. This PR implements that.

Impact

  • Public API / config: New config query.use-worker-advertised-memory-for-limit (boolean, default true). Documented in admin properties.
  • Behavior: With node-scheduler.include-coordinator=false, the coordinator can start with large query.max-memory-per-node / query.max-total-memory-per-node (for workers) as long as memory.heap-headroom-per-node fits in its heap. When the new config is true, effective query limits are capped by the sum of worker general pool capacity.
  • Performance: No intentional performance change; coordinator uses existing worker MemoryInfo already gathered for pool updates.

Test Plan

  • Unit: TestLocalMemoryManager – coordinator-only path allows large per-node config with small heap and fails when headroom ≥ heap. TestNodeMemoryConfigvalidateCoordinatorHeapHeadroom passes/fails as expected. TestMemoryManagerConfig – default and explicit mapping for query.use-worker-advertised-memory-for-limit.
  • Manual: Coordinator with node-scheduler.include-coordinator=false, small heap, and worker-sized query.max-memory-per-node starts successfully; with workers up, query limits are effectively capped by worker-advertised capacity when the new config is true.

Contributor checklist

  • Submission complies with contributing guide (code style and commit standards).
  • PR description addresses the change accurately.
  • Documented new property query.use-worker-advertised-memory-for-limit (default true) in admin properties.
  • Release notes filled below if required.
  • Tests added for coordinator-only validation and new config.
  • CI passed.
  • No new dependencies.

Release Notes

Summary by Sourcery

Adjust coordinator memory handling to support small-heap coordinators that do not schedule work, and cap query memory limits using worker-advertised capacity.

New Features:

  • Introduce coordinator-only LocalMemoryManager validation mode that sizes only a general pool based on heap minus headroom and allows worker-sized per-node limits on a small-heap coordinator.
  • Add an optional mechanism to cap per-query user and total memory limits by the sum of workers' advertised general pool capacity when the coordinator does not schedule work.

Enhancements:

  • Provide a LocalMemoryManagerProvider that configures LocalMemoryManager differently on coordinators that do not include themselves in scheduling.
  • Wire LocalMemoryManager through dependency injection so its behavior can depend on server and scheduler configuration.

Documentation:

  • Document the new query.use-worker-advertised-memory-for-limit configuration property in the admin properties.

Tests:

  • Extend memory-related unit tests to cover coordinator-only heap-headroom validation, the new configuration property defaults and mappings, and the worker-capacity-based limit behavior.

…ised capacity for limits

- LocalMemoryManager: coordinator-only validation when include-coordinator=false
  (only heap headroom validated; no reserved pool)
- LocalMemoryManagerProvider: wire coordinator-only path in ServerMainModule
- MemoryManagerConfig: query.use-worker-advertised-memory-for-limit (default true)
- ClusterMemoryManager: cap query limits by sum of worker general pool capacity
- Tests and admin docs for new config and validation
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 18, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 18, 2026

Reviewer's Guide

Implements coordinator-only memory validation and worker-advertised memory caps so that a non-scheduling coordinator can run with a smaller heap while still enforcing safe query limits based on worker capacity.

Sequence diagram for worker-advertised memory limits during query processing

sequenceDiagram
    actor User
    participant Coordinator
    participant ClusterMemoryManager
    participant Worker1
    participant Worker2
    participant GeneralPool as GeneralPool_cluster

    User->>Coordinator: submitQuery()
    Coordinator->>ClusterMemoryManager: registerQuery(query)

    loop periodicMemoryUpdates
        Worker1->>ClusterMemoryManager: sendMemoryInfo(generalPoolMaxBytes_1)
        Worker2->>ClusterMemoryManager: sendMemoryInfo(generalPoolMaxBytes_2)
        ClusterMemoryManager->>GeneralPool: updateTotalDistributedBytes()
    end

    loop memoryManagementCycle
        Coordinator->>ClusterMemoryManager: process(runningQueries)
        ClusterMemoryManager->>ClusterMemoryManager: readConfigFlags(isWorkScheduledOnCoordinator,useWorkerAdvertisedMemoryForLimit)
        alt coordinatorDoesNotScheduleWork_and_flagTrue
            ClusterMemoryManager->>GeneralPool: getTotalDistributedBytes()
            GeneralPool-->>ClusterMemoryManager: workerTotalCapacity
            ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryTotalMemoryInBytes = min(configuredMaxQueryTotalMemoryInBytes,workerTotalCapacity)
            ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryMemoryInBytes = min(configuredMaxQueryMemoryInBytes,effectiveMaxQueryTotalMemoryInBytes)
        else flagFalse_or_coordinatorSchedulesWork
            ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryMemoryInBytes = configuredMaxQueryMemoryInBytes
            ClusterMemoryManager->>ClusterMemoryManager: effectiveMaxQueryTotalMemoryInBytes = configuredMaxQueryTotalMemoryInBytes
        end

        ClusterMemoryManager->>ClusterMemoryManager: userMemoryLimit = min(effectiveMaxQueryMemoryInBytes,sessionQueryMaxMemory)
        ClusterMemoryManager->>ClusterMemoryManager: totalMemoryLimit = min(effectiveMaxQueryTotalMemoryInBytes,otherLimits)
        ClusterMemoryManager-->>Coordinator: enforceLimits_or_failQuery()
    end
Loading

Class diagram for updated memory management components

classDiagram
    class LocalMemoryManager {
        +DataSize maxMemory
        +Map_pools
        +LocalMemoryManager(NodeMemoryConfig_config)
        +LocalMemoryManager(NodeMemoryConfig_config,long_availableMemory)
        +LocalMemoryManager(NodeMemoryConfig_config,long_availableMemory,boolean_useCoordinatorOnlyValidation)
        -configureMemoryPools(NodeMemoryConfig_config,long_availableMemory,boolean_useCoordinatorOnlyValidation)
        +MemoryInfo getInfo()
        +static void validateHeapHeadroom(NodeMemoryConfig_config,long_availableMemory)
        +static void validateCoordinatorHeapHeadroom(NodeMemoryConfig_config,long_availableMemory)
    }

    class LocalMemoryManagerProvider {
        -NodeMemoryConfig nodeMemoryConfig
        -ServerConfig serverConfig
        -NodeSchedulerConfig nodeSchedulerConfig
        +LocalMemoryManagerProvider(NodeMemoryConfig_nodeMemoryConfig,ServerConfig_serverConfig,NodeSchedulerConfig_nodeSchedulerConfig)
        +LocalMemoryManager get()
    }

    class ClusterMemoryManager {
        -boolean isWorkScheduledOnCoordinator
        -boolean isBinaryTransportEnabled
        -boolean useWorkerAdvertisedMemoryForLimit
        -Map_pools
        -long maxQueryMemoryInBytes
        -long maxQueryTotalMemoryInBytes
        +ClusterMemoryManager(MemoryManagerConfig_config,NodeSchedulerConfig_schedulerConfig,ServerConfig_serverConfig,QueryManagerConfig_queryManagerConfig,MemoryManagerConfig_memoryManagerConfig,FeaturesConfig_featuresConfig,NodeTaskMap_nodeTaskMap,MemoryPool_assigner,QueryIdGenerator_queryIdGenerator)
        +void process(Iterable_runningQueries)
    }

    class MemoryManagerConfig {
        -String lowMemoryKillerPolicy
        -Duration killOnOutOfMemoryDelay
        -boolean tableFinishOperatorMemoryTrackingEnabled
        -boolean useWorkerAdvertisedMemoryForLimit
        +boolean isUseWorkerAdvertisedMemoryForLimit()
        +MemoryManagerConfig setUseWorkerAdvertisedMemoryForLimit(boolean_useWorkerAdvertisedMemoryForLimit)
    }

    class NodeMemoryConfig
    class ServerConfig {
        +boolean isCoordinator()
    }
    class NodeSchedulerConfig {
        +boolean isIncludeCoordinator()
    }

    LocalMemoryManagerProvider ..> LocalMemoryManager : creates
    LocalMemoryManagerProvider --> NodeMemoryConfig : uses
    LocalMemoryManagerProvider --> ServerConfig : uses
    LocalMemoryManagerProvider --> NodeSchedulerConfig : uses

    ClusterMemoryManager --> MemoryManagerConfig : reads_limits
    ClusterMemoryManager --> NodeSchedulerConfig : reads_includeCoordinator

    ServerConfig ..> ClusterMemoryManager
    NodeMemoryConfig ..> LocalMemoryManager
    MemoryManagerConfig ..> ClusterMemoryManager
Loading

File-Level Changes

Change Details Files
Add coordinator-only memory validation path that only checks heap headroom and creates a single general pool without a reserved pool.
  • Introduce a three-argument LocalMemoryManager constructor taking useCoordinatorOnlyValidation and delegate existing two-arg constructor to it with false.
  • Update configureMemoryPools to branch on useCoordinatorOnlyValidation, validating only coordinator heap headroom, sizing the general pool to (heap - headroom), skipping reserved pool creation, and otherwise preserving the existing validation logic.
  • Add validateCoordinatorHeapHeadroom helper that ensures heap headroom is non-negative and less than available memory, and use it in the coordinator-only path.
  • Extend LocalMemoryManager tests to cover coordinator-only validation success with large per-node limits and failure when headroom exceeds heap.
presto-main-base/src/main/java/com/facebook/presto/memory/LocalMemoryManager.java
presto-main-base/src/test/java/com/facebook/presto/memory/TestLocalMemoryManager.java
presto-main-base/src/test/java/com/facebook/presto/memory/TestNodeMemoryConfig.java
Make LocalMemoryManager construction environment-aware via a provider that enables coordinator-only validation when the coordinator does not schedule work on itself.
  • Introduce LocalMemoryManagerProvider that inspects ServerConfig and NodeSchedulerConfig to decide whether to enable coordinator-only validation and constructs LocalMemoryManager with Runtime max heap.
  • Bind LocalMemoryManager in ServerMainModule to be provided by LocalMemoryManagerProvider instead of direct singleton binding.
presto-main/src/main/java/com/facebook/presto/server/LocalMemoryManagerProvider.java
presto-main/src/main/java/com/facebook/presto/server/ServerMainModule.java
Add configuration flag to cap query memory limits based on worker-advertised general pool capacity and wire it into cluster memory management.
  • Extend MemoryManagerConfig with boolean useWorkerAdvertisedMemoryForLimit (default true) and expose it as config property query.use-worker-advertised-memory-for-limit with description.
  • Update ClusterMemoryManager to read the new flag, compute effectiveMaxQueryMemoryInBytes and effectiveMaxQueryTotalMemoryInBytes based on worker general pool total capacity when the coordinator does not schedule work, and use these effective limits for user and total query memory limit checks.
  • Update MemoryManagerConfig tests to cover defaults and explicit mappings for the new property.
presto-main-base/src/main/java/com/facebook/presto/memory/MemoryManagerConfig.java
presto-main/src/main/java/com/facebook/presto/memory/ClusterMemoryManager.java
presto-main-base/src/test/java/com/facebook/presto/memory/TestMemoryManagerConfig.java
Update documentation for the new memory manager configuration property.
  • Document query.use-worker-advertised-memory-for-limit in admin properties, describing its behavior and default value.
presto-docs/src/main/sphinx/admin/properties.rst

Possibly linked issues

  • #(unlisted): PR implements coordinator-only validation so query per-node memory needn’t fit coordinator heap, directly resolving the issue

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The LocalMemoryManager(NodeMemoryConfig, long, boolean) constructor is annotated @VisibleForTesting but is now used in production via LocalMemoryManagerProvider; either remove the annotation or introduce a separate production-facing factory to avoid misleading the intent.
  • In ClusterMemoryManager.process, when worker-advertised capacity reduces effectiveMaxQueryMemoryInBytes / effectiveMaxQueryTotalMemoryInBytes, consider emitting a debug log with the capped values and worker capacity to aid in diagnosing cluster-wide memory limit behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `LocalMemoryManager(NodeMemoryConfig, long, boolean)` constructor is annotated `@VisibleForTesting` but is now used in production via `LocalMemoryManagerProvider`; either remove the annotation or introduce a separate production-facing factory to avoid misleading the intent.
- In `ClusterMemoryManager.process`, when worker-advertised capacity reduces `effectiveMaxQueryMemoryInBytes` / `effectiveMaxQueryTotalMemoryInBytes`, consider emitting a debug log with the capped values and worker capacity to aid in diagnosing cluster-wide memory limit behavior.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/test/java/com/facebook/presto/memory/TestMemoryManagerConfig.java:59-62` </location>
<code_context>
                 .put("query.max-total-memory", "3GB")
                 .put("query.soft-max-total-memory", "2GB")
                 .put("table-finish-operator-memory-tracking-enabled", "true")
+                .put("query.use-worker-advertised-memory-for-limit", "false")
                 .build();

</code_context>

<issue_to_address>
**issue (testing):** Missing behavioral tests for worker-advertised capacity capping of query limits in ClusterMemoryManager.

Config-level coverage is good, but we still lack tests that exercise this behavior in `ClusterMemoryManager`. Please add tests (in the existing `ClusterMemoryManager` test suite) for at least: (1) `useWorkerAdvertisedMemoryForLimit = true` with worker capacity smaller than configured limits, asserting effective user/total limits are capped; (2) capacity larger than configured limits, asserting configured limits still apply; and (3) `useWorkerAdvertisedMemoryForLimit = false` or coordinator scheduling work, asserting behavior is unchanged. This will verify the new flag and config interaction with cluster-level memory enforcement end‑to‑end.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@steveburnett
Copy link
Contributor

Please sign the Presto CLA as mentioned in this comment. Thanks!

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! Looks good, just a nit of phrasing.

Co-authored-by: Steve Burnett <burnett@pobox.com>
steveburnett
steveburnett previously approved these changes Feb 19, 2026
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@patdevinwilson patdevinwilson force-pushed the pwilson/coordinator-memory-worker-advertised-limit branch from b6a60e1 to e4471e9 Compare February 19, 2026 17:31
@patdevinwilson patdevinwilson changed the title Coordinator memory: validate only coordinator heap; use worker-advert… fix (Coordinator memory: validate only coordinator heap; use worker-advert… Feb 19, 2026
@patdevinwilson patdevinwilson changed the title fix (Coordinator memory: validate only coordinator heap; use worker-advert… fix: Coordinator memory, validate only coordinator heap, use worker heap capacity Feb 19, 2026
- LazyOutputBuffer: no-op when delegate is null and state is terminal to avoid
  IllegalStateException 'Buffer has not been initialized' on teardown/races
- TestMetadata.testShowTables: use information_schema.tables instead of
  SHOW TABLES LIKE so expected (Java) query runs reliably in native-vs-java tests
@patdevinwilson patdevinwilson requested a review from a team as a code owner February 20, 2026 14:50
- Listener: return early when task revocable memory <= threshold (don't schedule)
- Visitor: only request revocation when !isMemoryRevokingRequested() to avoid
  stale revoking-requested flags (fixes TestMemoryRevokingScheduler.testTaskThresholdRevokingSchedulerImmediate)

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments