Skip to content

Commit 05676b9

Browse files
jeffhostetlerdscho
authored andcommitted
fsmonitor: force update index after large responses
Set the `FSMONITOR_CHANGED` bit on `istate->cache_changed` when FSMonitor returns a very large repsonse to ensure that the index is written to disk. Normally, when the FSMonitor response includes a tracked file, the index is always updated. Similarly, the index might be updated when the response alters the untracked-cache (when enabled). However, in cases where neither of those cause the index to be considered changed, the FSMonitor response is wasted. Subsequent Git commands will make requests with the same token and receive the same response. If that response is very large, performance may suffer. It would be more efficient to force update the index now (and the token in the index extension) in order to reduce the size of the response received by future commands. This was observed on Windows after a large checkout. On Windows, the kernel emits events for the files that are changed as they are changed. However, it might delay events for the containing directories until the system is more idle (or someone scans the directory (so it seems)). The first status following a checkout would get the list of files. The subsequent status commands would get the list of directories as the events trickled out. But they would never catch up because the token was not advanced because the index wasn't updated. This list of directories caused `wt_status_collect_untracked()` to unnecessarily spend time actually scanning them during each command. Signed-off-by: Jeff Hostetler <[email protected]>
1 parent 9c07b5d commit 05676b9

File tree

1 file changed

+49
-1
lines changed

1 file changed

+49
-1
lines changed

fsmonitor.c

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,45 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
233233
untracked_cache_invalidate_path(istate, name, 0);
234234
}
235235

236+
/*
237+
* The number of pathnames that we need to receive from FSMonitor
238+
* before we force the index to be updated.
239+
*
240+
* Note that any pathname within the set of received paths MAY cause
241+
* cache-entry or istate flag bits to be updated and thus cause the
242+
* index to be updated on disk.
243+
*
244+
* However, the response may contain many paths (such as ignored
245+
* paths) that will not update any flag bits. And thus not force the
246+
* index to be updated. (This is fine and normal.) It also means
247+
* that the token will not be updated in the FSMonitor index
248+
* extension. So the next Git command will find the same token in the
249+
* index, make the same token-relative request, and receive the same
250+
* response (plus any newly changed paths). If this response is large
251+
* (and continues to grow), performance could be impacted.
252+
*
253+
* For example, if the user runs a build and it writes 100K object
254+
* files but doesn't modify any source files, the index would not need
255+
* to be updated. The FSMonitor response (after the build and
256+
* relative to a pre-build token) might be 5MB. Each subsequent Git
257+
* command will receive that same 100K/5MB response until something
258+
* causes the index to be updated. And `refresh_fsmonitor()` will
259+
* have to iterate over those 100K paths each time.
260+
*
261+
* Performance could be improved if we optionally force update the
262+
* index after a very large response and get an updated token into
263+
* the FSMonitor index extension. This should allow subsequent
264+
* commands to get smaller and more current responses.
265+
*
266+
* The value chosen here does not need to be precise. The index
267+
* will be updated automatically the first time the user touches
268+
* a tracked file and causes a command like `git status` to
269+
* update an mtime to be updated and/or set a flag bit.
270+
*
271+
* NEEDSWORK: Does this need to be a config value?
272+
*/
273+
static int fsmonitor_force_update_threshold = 100;
274+
236275
void refresh_fsmonitor(struct index_state *istate)
237276
{
238277
struct strbuf query_result = STRBUF_INIT;
@@ -370,19 +409,28 @@ void refresh_fsmonitor(struct index_state *istate)
370409
*
371410
* This updates both the cache-entries and the untracked-cache.
372411
*/
412+
int count = 0;
413+
373414
buf = query_result.buf;
374415
for (i = bol; i < query_result.len; i++) {
375416
if (buf[i] != '\0')
376417
continue;
377418
fsmonitor_refresh_callback(istate, buf + bol);
378419
bol = i + 1;
420+
count++;
379421
}
380-
if (bol < query_result.len)
422+
if (bol < query_result.len) {
381423
fsmonitor_refresh_callback(istate, buf + bol);
424+
count++;
425+
}
382426

383427
/* Now mark the untracked cache for fsmonitor usage */
384428
if (istate->untracked)
385429
istate->untracked->use_fsmonitor = 1;
430+
431+
if (count > fsmonitor_force_update_threshold)
432+
istate->cache_changed |= FSMONITOR_CHANGED;
433+
386434
} else {
387435
/*
388436
* We received a trivial response, so invalidate everything.

0 commit comments

Comments
 (0)