Skip to content

Commit 1b7be8c

Browse files
authored
Decrease minimum deletes percentage in TMP (#14893)
1 parent 980ab4a commit 1b7be8c

File tree

2 files changed

+13
-3
lines changed

2 files changed

+13
-3
lines changed

lucene/CHANGES.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,9 @@ API Changes
120120
* GITHUB#14426: Support determining desired off-heap memory requirements through
121121
KnnVectorsReader::getOffHeapByteSize (Chris Hegarty)
122122

123+
* GITHUB#14893: TieredMergePolicy minimum deletes percentage decreased from 5% inclusive to 0% exclusive.
124+
(Stefan Vodita)
125+
123126
* GITHUB#14899: Deprecate MergeSpecification#segString(Directory) (kitoha)
124127

125128
* GITHUB#14978: Add a bulk scoring interface to RandomVectorScorer

lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -130,18 +130,25 @@ public double getMaxMergedSegmentMB() {
130130
/**
131131
* Sets the maximum percentage of doc id space taken by deleted docs. The denominator includes
132132
* both active and deleted documents. Lower values make the index more space efficient at the
133-
* expense of increased CPU and I/O activity. Values must be between 5 and 50. Default value is
133+
* expense of increased CPU and I/O activity. Values must be between 0 and 50. Default value is
134134
* 20.
135135
*
136136
* <p>When the maximum delete percentage is lowered, the indexing thread will call for merges more
137137
* often, meaning that write amplification factor will be increased. Write amplification factor
138138
* measures the number of times each document in the index is written. A higher write
139139
* amplification factor will lead to higher CPU and I/O activity as indicated above.
140+
*
141+
* <p>Values below 5% can lead to exceptionally high merge cost where indexing will continuously
142+
* merge nearly all segments, and select newly merged segments immediately for merging again,
143+
* often forcing degenerate merge selection like singleton merges. If you venture into this dark
144+
* forest, consider limiting the maximum number of concurrent merges and threads (see {@link
145+
* ConcurrentMergeScheduler#setMaxMergesAndThreads}) as a coarse attempt to bound the otherwise
146+
* pathological indexing behavior.
140147
*/
141148
public TieredMergePolicy setDeletesPctAllowed(double v) {
142-
if (v < 5 || v > 50) {
149+
if (v <= 0 || v > 50) {
143150
throw new IllegalArgumentException(
144-
"indexPctDeletedTarget must be >= 5.0 and <= 50 (got " + v + ")");
151+
"indexPctDeletedTarget must be > 0 and <= 50 (got " + v + ")");
145152
}
146153
deletesPctAllowed = v;
147154
return this;

0 commit comments

Comments
 (0)