Skip to content

Commit 12ecb01

Browse files
committed
sha1-lookup: make selection of 'middle' less aggressive
If we pick 'mi' between 'lo' and 'hi' at 50%, which was what the simple binary search did, we are halving the search space whether the entry at 'mi' is lower or higher than the target. The previous patch was about picking not the middle but closer to 'hi', when we know the target is a lot closer to 'hi' than it is to 'lo'. However, if it turns out that the entry at 'mi' is higher than the target, we would end up reducing the search space only by the difference between 'mi' and 'hi' (which by definition is less than 50% --- that was the whole point of not using the simple binary search), which made the search less efficient. And the risk of overshooting becomes very high, if we try to be too precise. This tweaks the selection of 'mi' to be a bit closer to the middle than we would otherwise pick to avoid the problem. Signed-off-by: Junio C Hamano <[email protected]>
1 parent 628522e commit 12ecb01

File tree

1 file changed

+26
-7
lines changed

1 file changed

+26
-7
lines changed

sha1-lookup.c

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,12 @@
5050
* the midway of the table. It can reasonably be expected to be near
5151
* 87% (222/256) from the top of the table.
5252
*
53+
* However, we do not want to pick "mi" too precisely. If the entry at
54+
* the 87% in the above example turns out to be higher than the target
55+
* we are looking for, we would end up narrowing the search space down
56+
* only by 13%, instead of 50% we would get if we did a simple binary
57+
* search. So we would want to hedge our bets by being less aggressive.
58+
*
5359
* The table at "table" holds at least "nr" entries of "elem_size"
5460
* bytes each. Each entry has the SHA-1 key at "key_offset". The
5561
* table is sorted by the SHA-1 key of the entries. The caller wants
@@ -119,11 +125,25 @@ int sha1_entry_pos(const void *table,
119125
if (hiv < kyv)
120126
return -1 - hi;
121127

122-
if (kyv == lov && lov < hiv - 1)
123-
kyv++;
124-
else if (kyv == hiv - 1 && lov < kyv)
125-
kyv--;
126-
128+
/*
129+
* Even if we know the target is much closer to 'hi'
130+
* than 'lo', if we pick too precisely and overshoot
131+
* (e.g. when we know 'mi' is closer to 'hi' than to
132+
* 'lo', pick 'mi' that is higher than the target), we
133+
* end up narrowing the search space by a smaller
134+
* amount (i.e. the distance between 'mi' and 'hi')
135+
* than what we would have (i.e. about half of 'lo'
136+
* and 'hi'). Hedge our bets to pick 'mi' less
137+
* aggressively, i.e. make 'mi' a bit closer to the
138+
* middle than we would otherwise pick.
139+
*/
140+
kyv = (kyv * 6 + lov + hiv) / 8;
141+
if (lov < hiv - 1) {
142+
if (kyv == lov)
143+
kyv++;
144+
else if (kyv == hiv)
145+
kyv--;
146+
}
127147
mi = (range - 1) * (kyv - lov) / (hiv - lov) + lo;
128148

129149
if (debug_lookup) {
@@ -142,8 +162,7 @@ int sha1_entry_pos(const void *table,
142162
if (cmp > 0) {
143163
hi = mi;
144164
hi_key = mi_key;
145-
}
146-
else {
165+
} else {
147166
lo = mi + 1;
148167
lo_key = mi_key + elem_size;
149168
}

0 commit comments

Comments
 (0)