Skip to content

Pause writing data to disk for merges when disk almost fullΒ #88606

@DaveCTurner

Description

@DaveCTurner

If a node exceeds the flood-stage disk watermark then we block further writes to its indices but we allow merges to continue. Merges can temporarily consume a very large amount of disk space, more than enough to fill up the gap between any reasonable flood-stage watermark and a completely full disk. When a node completely fills its disk, it basically dies.

We could pause merge-related writes in this situation, for instance by overriding Store$StoreDirectory#createOutput and adjusting the output's behaviour according to the supplied IOContext.

We probably don't want to do this for all writes, because (e.g.) a primary may need to refresh before it can relocate itself elsewhere, and because blocking random write threads seems like a recipe for deadlocks. Blocking merge threads seems ok tho. We may also need to be sensitive to the size of the merge (see IOContext.mergeInfo.estimatedMergeBytes and IOContext.flushInfo.estimatedSegmentSize) since smaller merges may soon be triggered by the merge-on-refresh feature.

It's unclear whether to do this based on the read_only_allow_delete block (which affects other nodes below the flood-stage watermark) or the actual disk usage on the node (which may not know the flood-stage watermark that the master is using).


NB we can also consider reducing the flood-stage max headroom once we have better protection against merges consuming all the remaining space.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.SupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Distributed IndexingMeta label for Distributed Indexing team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions