- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Threadpool merge scheduler #120869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            albertzaharovits
  merged 236 commits into
  elastic:main
from
albertzaharovits:threadpool-merge-scheduler-sort-all-merges-take-2
  
      
      
   
  Mar 18, 2025 
      
    
                
     Merged
            
            Threadpool merge scheduler #120869
                    albertzaharovits
  merged 236 commits into
  elastic:main
from
albertzaharovits:threadpool-merge-scheduler-sort-all-merges-take-2
  
      
      
   
  Mar 18, 2025 
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    
  This was referenced Jun 9, 2025 
      
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 9, 2025 
    
    
      
  
    
      
    
  
This adds a new merge scheduler implementation that uses a (new) dedicated thread pool to run the merges. This way the number of concurrent merges is limited to the number of threads in the pool (i.e. the number of allocated processors to the ES JVM). It implements dynamic IO throttling (the same target IO rate for all merges, roughly, with caveats) that's adjusted based on the number of currently active (queued + running) merges. Smaller merges are always preferred to larger ones, irrespective of the index shard that they're coming from. The implementation also supports the per-shard "max thread count" and "max merge count" settings, the later being used today for indexing throttling. Note that IO throttling, max merge count, and max thread count work similarly, but not identical, to their siblings in the ConcurrentMergeScheduler. The per-shard merge statistics are not affected, and the thread-pool statistics should reflect the merge ones (i.e. the completed thread pool stats reflects the total number of merges, across shards, per node).
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 9, 2025 
    
    
      
  
    
      
    
  
…ep up with the merge load (elastic#125654) Fixes an issue where indexing throttling kicks in while disk IO is throttling. Instead disk IO should first unthrottle, and only then, if we still can't keep up with the merging load, start throttling indexing. Fixes elastic/elasticsearch-benchmarks#2437 Relates elastic#120869
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 9, 2025 
    
    
      
  
    
      
    
  
The intent here is to aim for fewer to-do merges enqueued for execution, and to unthrottle disk IO at a faster rate when the queue grows longer. Overall this results in less merge disk throttling. Relates elastic/elasticsearch-benchmarks#2437 elastic#120869
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 9, 2025 
    
    
  
  This was referenced Jun 9, 2025 
      
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 17, 2025 
    
    
      
  
    
      
    
  
…gMergeTasks (elastic#126058) Fixes elastic#125842 Relates elastic#120869
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 17, 2025 
    
    
      
  
    
      
    
  
…gMergeTasks (elastic#126058) Fixes elastic#125842 Relates elastic#120869
    
  albertzaharovits 
      added a commit
      that referenced
      this pull request
    
      Jun 17, 2025 
    
    
  
    
  albertzaharovits 
      added a commit
      that referenced
      this pull request
    
      Jun 17, 2025 
    
    
  
    
  elasticsearchmachine 
      pushed a commit
      that referenced
      this pull request
    
      Jun 18, 2025 
    
    
      
  
    
      
    
  
This deprecates the `indices.merge.scheduler.use_thread_pool` setting that was introduced in #120869 because this setting should not normally be used, unless instructed so by engineering to get around temporary issues with the new threadpool-based merge scheduler.
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jun 18, 2025 
    
    
      
  
    
      
    
  
…9464) This deprecates the `indices.merge.scheduler.use_thread_pool` setting that was introduced in elastic#120869 because this setting should not normally be used, unless instructed so by engineering to get around temporary issues with the new threadpool-based merge scheduler.
    
  elasticsearchmachine 
      pushed a commit
      that referenced
      this pull request
    
      Jun 18, 2025 
    
    
      
  
    
      
    
  
…9464) (#129628) * Deprecate indices.merge.scheduler.use_thread_pool setting (#129464) This deprecates the `indices.merge.scheduler.use_thread_pool` setting that was introduced in #120869 because this setting should not normally be used, unless instructed so by engineering to get around temporary issues with the new threadpool-based merge scheduler. * Update warning msg
    
  kderusso 
      pushed a commit
        to kderusso/elasticsearch
      that referenced
      this pull request
    
      Jun 23, 2025 
    
    
      
  
    
      
    
  
…9464) This deprecates the `indices.merge.scheduler.use_thread_pool` setting that was introduced in elastic#120869 because this setting should not normally be used, unless instructed so by engineering to get around temporary issues with the new threadpool-based merge scheduler.
    
  mridula-s109 
      pushed a commit
        to mridula-s109/elasticsearch
      that referenced
      this pull request
    
      Jun 25, 2025 
    
    
      
  
    
      
    
  
…9464) This deprecates the `indices.merge.scheduler.use_thread_pool` setting that was introduced in elastic#120869 because this setting should not normally be used, unless instructed so by engineering to get around temporary issues with the new threadpool-based merge scheduler.
    
  elasticsearchmachine 
      pushed a commit
      that referenced
      this pull request
    
      Jul 2, 2025 
    
    
  
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jul 2, 2025 
    
    
      
  
    
      
    
  
This documents the new threadpool-based merge scheduler, which is disk space aware, and blocks merges when disk space is low. The code changes were mostly introduced in elastic#120869 and elastic#127613 .
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jul 2, 2025 
    
    
      
  
    
      
    
  
This documents the new threadpool-based merge scheduler, which is disk space aware, and blocks merges when disk space is low. The code changes were mostly introduced in elastic#120869 and elastic#127613 .
    
  elasticsearchmachine 
      pushed a commit
      that referenced
      this pull request
    
      Jul 2, 2025 
    
    
  
    
  elasticsearchmachine 
      pushed a commit
      that referenced
      this pull request
    
      Jul 2, 2025 
    
    
  
    
  albertzaharovits 
      added a commit
        to albertzaharovits/elasticsearch
      that referenced
      this pull request
    
      Jul 3, 2025 
    
    
      
  
    
      
    
  
…k space aware, and blocks merges when disk space is low. The code changes were mostly introduced in elastic#120869 and elastic#127613 .
    
  mridula-s109 
      pushed a commit
        to mridula-s109/elasticsearch
      that referenced
      this pull request
    
      Jul 3, 2025 
    
    
      
  
    
      
    
  
This documents the new threadpool-based merge scheduler, which is disk space aware, and blocks merges when disk space is low. The code changes were mostly introduced in elastic#120869 and elastic#127613 .
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      Labels
      
    :Distributed Indexing/Engine
  Anything around managing Lucene and the Translog in an open shard. 
  
    >feature
  
    serverless-linked
  Added by automation, don't add manually 
  
    Team:Distributed Indexing
  Meta label for Distributed Indexing team 
  
    v9.1.0
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This adds a new merge scheduler implementation that uses a (new) dedicated thread pool to run the merges. This way the number of concurrent merges is limited to the number of threads in the pool (i.e. the number of allocated processors to the ES JVM).
It implements dynamic IO throttling (the same target IO rate for all merges, roughly, with caveats) that's adjusted based on the number of currently active (queued + running) merges.
Smaller merges are always preferred to larger ones, irrespective of the index shard that they're coming from.
The implementation also supports the per-shard "max thread count" and "max merge count" settings, the later being used today for indexing throttling.
Note that IO throttling, max merge count, and max thread count work similarly, but not identical, to their siblings in the
ConcurrentMergeScheduler.The per-shard merge statistics are not affected, and the thread-pool statistics should reflect the merge ones (i.e. the completed thread pool stats reflects the total number of merges, across shards, per node).