Commit 507ddbc
committed
fix: fix bug when
There was a latent bug that I wasn't seeing with `height` 30 but I was
seeing with `height` 1000. I did some bisecting, and found the magic
number where the bug started to manifest more obviously was around a
height of 326. At that height, the bug would show up as zero files
coming back, or 4, or 20, or 31. Closing and reopening in succession
would produce different results, suggesting a racey ordering issue.
Let's see if I can explain it.
- A `height` of 326 ends up being passed in as a `limit` in the
matcher. In the directory where I was seeing the bug, I had about
3K files, and 10 CPU cores (and therefore 10 worker threads).
- So, each worker thread would end up getting as many as 326 files,
for a total of 3,260 in all (or speaking with more precision, each
worker would have a heap of size 326, but because there are only 3K
files in the repo, the last worker would actually end up with an empty
heap and the second-to-last one would only have a semi-full one).
- Because `count` (3,260) is greater than `limit` (326), we would re-set
count to `326`, and allocate a `result_t` struct with storage for 326
candidate strings.
- Then in the `for` loop below we would loop over the first 326 entries
and copy any `candidate` with a non-zero score.
The obvious bug, then, is that you might end up skipping over a bunch of
results because they have zero scores, and in the worst case, the first
326 results you look at would all have zero scores and you would copy
nothing!
Bug was introduced in 08f4ce1 ("fix: use C11 atomic functions
for concurrent counter updates", 2024-08-17), so that makes that the
second dumb thing I've discovered so far that I did in that commit 🤦:
```diff
free(threads);
free(worker_args);
+ unsigned count = atomic_load(&matches_count);
if (needle_length == 0 || (needle_length == 1 && matcher->needle[0] == '.')) {
// Alphabetic order if search string is only "" or "."
- qsort(matches, matches_count, sizeof(haystack_t *), cmp_alpha_p);
+ qsort(matches, count, sizeof(haystack_t *), cmp_alpha_p);
} else {
- qsort(matches, matches_count, sizeof(haystack_t *), cmp_score_p);
+ qsort(matches, count, sizeof(haystack_t *), cmp_score_p);
}
+ if (count > limit) {
+ count = limit;
+ }
result_t *results = xmalloc(sizeof(result_t));
- unsigned count = matches_count > limit ? limit : matches_count;
results->matches = xmalloc(count * sizeof(const char *));
results->match_count = 0;
results->candidate_count = candidate_count;
```
Fix is straightforward:
- Don't reset `count` to `limit`. So, loop over all `count` results if
you have to in order to reach the requested `limit`.
The `for` loop can just stay as if was.
You may be wondering why I didn't notice this with `height` 30? Let's
compare, assuming an empty query (""). In this situation, given typical
configuration, most files will have a score of 1.0, but dotfiles will
have a score of 0.0, because they are hidden by default.
- With `height` (and therefore `limit`) of 30, we'd wind up with 10
worker threads as before, and each would add 30 items to the list of
matches: 300 in all. In my test repo, these are all non-dotfiles, with
a score of 1.0
- With `height` 326, as shown above, we would have 3,260 matches (or
more accurately, about 3K matches) in all. But now, because the repo
only has 3K files, we'll have all of our non-dotfiles (with score 1.0)
plus all of our dotfiles (score 0.0).
So, with a smaller set of matches, you won't have any zero-scored
results which would cause you to waste slots that could otherwise be
used to show a match.
But with the bigger set, here's the kicker: we sort alphabetically,
which means all the dotfiles will be at the top of the list; these
zero-scored results waste all of the slots, and the user sees nothing.
(Now, I'm not _totally_ convinced by this explanation, because I'm not
sure why it wasn't 100% deterministic, and I would get different results
for repeated attempts. But still, it's probably close to being valid.)height is set to a large value1 parent 2e9f17b commit 507ddbc
1 file changed
+1
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
229 | 229 | | |
230 | 230 | | |
231 | 231 | | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | 232 | | |
236 | | - | |
| 233 | + | |
237 | 234 | | |
238 | 235 | | |
239 | 236 | | |
| |||
0 commit comments