Skip to content

Commit 26b9f34

Browse files
jeffhostetlergitster
authored andcommitted
fsmonitor: force update index after large responses
Measure the time taken to apply the FSMonitor query result to the index and the untracked-cache. Set the `FSMONITOR_CHANGED` bit on `istate->cache_changed` when FSMonitor returns a very large repsonse to ensure that the index is written to disk. Normally, when the FSMonitor response includes a tracked file, the index is always updated. Similarly, the index might be updated when the response alters the untracked-cache (when enabled). However, in cases where neither of those cause the index to be considered changed, the FSMonitor response is wasted. Subsequent Git commands will make requests with the same token and receive the same response. If that response is very large, performance may suffer. It would be more efficient to force update the index now (and the token in the index extension) in order to reduce the size of the response received by future commands. This was observed on Windows after a large checkout. On Windows, the kernel emits events for the files that are changed as they are changed. However, it might delay events for the containing directories until the system is more idle (or someone scans the directory (so it seems)). The first status following a checkout would get the list of files. The subsequent status commands would get the list of directories as the events trickled out. But they would never catch up because the token was not advanced because the index wasn't updated. This list of directories caused `wt_status_collect_untracked()` to unnecessarily spend time actually scanning them during each command. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent b05880d commit 26b9f34

File tree

1 file changed

+54
-1
lines changed

1 file changed

+54
-1
lines changed

fsmonitor.c

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,43 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
219219
untracked_cache_invalidate_path(istate, name, 0);
220220
}
221221

222+
/*
223+
* The number of pathnames that we need to receive from FSMonitor
224+
* before we force the index to be updated.
225+
*
226+
* Note that any pathname within the set of received paths MAY cause
227+
* cache-entry or istate flag bits to be updated and thus cause the
228+
* index to be updated on disk.
229+
*
230+
* However, the response may contain many paths (such as ignored
231+
* paths) that will not update any flag bits. And thus not force the
232+
* index to be updated. (This is fine and normal.) It also means
233+
* that the token will not be updated in the FSMonitor index
234+
* extension. So the next Git command will find the same token in the
235+
* index, make the same token-relative request, and receive the same
236+
* response (plus any newly changed paths). If this response is large
237+
* (and continues to grow), performance could be impacted.
238+
*
239+
* For example, if the user runs a build and it writes 100K object
240+
* files but doesn't modify any source files, the index would not need
241+
* to be updated. The FSMonitor response (after the build and
242+
* relative to a pre-build token) might be 5MB. Each subsequent Git
243+
* command will receive that same 100K/5MB response until something
244+
* causes the index to be updated. And `refresh_fsmonitor()` will
245+
* have to iterate over those 100K paths each time.
246+
*
247+
* Performance could be improved if we optionally force update the
248+
* index after a very large response and get an updated token into
249+
* the FSMonitor index extension. This should allow subsequent
250+
* commands to get smaller and more current responses.
251+
*
252+
* The value chosen here does not need to be precise. The index
253+
* will be updated automatically the first time the user touches
254+
* a tracked file and causes a command like `git status` to
255+
* update an mtime to be updated and/or set a flag bit.
256+
*/
257+
static int fsmonitor_force_update_threshold = 100;
258+
222259
void refresh_fsmonitor(struct index_state *istate)
223260
{
224261
struct strbuf query_result = STRBUF_INIT;
@@ -362,25 +399,39 @@ void refresh_fsmonitor(struct index_state *istate)
362399
* information and that we should consider everything
363400
* invalid. We call this a trivial response.
364401
*/
402+
trace2_region_enter("fsmonitor", "apply_results", istate->repo);
403+
365404
if (query_success && !is_trivial) {
366405
/*
367406
* Mark all pathnames returned by the monitor as dirty.
368407
*
369408
* This updates both the cache-entries and the untracked-cache.
370409
*/
410+
int count = 0;
411+
371412
buf = query_result.buf;
372413
for (i = bol; i < query_result.len; i++) {
373414
if (buf[i] != '\0')
374415
continue;
375416
fsmonitor_refresh_callback(istate, buf + bol);
376417
bol = i + 1;
418+
count++;
377419
}
378-
if (bol < query_result.len)
420+
if (bol < query_result.len) {
379421
fsmonitor_refresh_callback(istate, buf + bol);
422+
count++;
423+
}
380424

381425
/* Now mark the untracked cache for fsmonitor usage */
382426
if (istate->untracked)
383427
istate->untracked->use_fsmonitor = 1;
428+
429+
if (count > fsmonitor_force_update_threshold)
430+
istate->cache_changed |= FSMONITOR_CHANGED;
431+
432+
trace2_data_intmax("fsmonitor", istate->repo, "apply_count",
433+
count);
434+
384435
} else {
385436
/*
386437
* We failed to get a response or received a trivial response,
@@ -409,6 +460,8 @@ void refresh_fsmonitor(struct index_state *istate)
409460
if (istate->untracked)
410461
istate->untracked->use_fsmonitor = 0;
411462
}
463+
trace2_region_leave("fsmonitor", "apply_results", istate->repo);
464+
412465
strbuf_release(&query_result);
413466

414467
/* Now that we've updated istate, save the last_update_token */

0 commit comments

Comments
 (0)