Skip to content

Commit b4be391

Browse files
neerajsi-msftdscho
authored andcommitted
core.fsyncobjectfiles: batched disk flushes
When adding many objects to a repo with core.fsyncObjectFiles set to true, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. Fortunately, Windows, and macOS offer mechanisms to write data from the filesystem page cache without initiating a hardware flush. Linux has the sync_file_range API, which issues a pagecache writeback request reliably after version 5.2. This patch introduces a new 'core.fsyncObjectFiles = batch' option that batches up hardware flushes. It hooks into the bulk-checkin plugging and unplugging functionality and takes advantage of tmp-objdir. When the new mode is enabled do the following for each new object: 1. Create the object in a tmp-objdir. 2. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1. Issue an fsync against a dummy file to flush the hardware writeback cache, which should by now have processed the tmp-objdir writes. 2. Rename all of the tmp-objdir files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the case today, but may be a good extension to those components. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This change also updates the macOS code to trigger a real hardware flush via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on macOS there was no guarantee of durability since a simple fsync(2) call does not flush any hardware caches. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. This number is from a patch later in the series. Adding 500 files to the repo with 'git add' Times reported in seconds. core.fsyncObjectFiles | Linux | Mac | Windows ----------------------|-------|-------|-------- false | 0.06 | 0.35 | 0.61 true | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 117ba9f commit b4be391

File tree

13 files changed

+185
-11
lines changed

13 files changed

+185
-11
lines changed

Documentation/config/core.txt

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -548,12 +548,29 @@ core.whitespace::
548548
errors. The default tab width is 8. Allowed values are 1 to 63.
549549

550550
core.fsyncObjectFiles::
551-
This boolean will enable 'fsync()' when writing object files.
552-
+
553-
This is a total waste of time and effort on a filesystem that orders
554-
data writes properly, but can be useful for filesystems that do not use
555-
journalling (traditional UNIX filesystems) or that only journal metadata
556-
and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
551+
A value indicating the level of effort Git will expend in
552+
trying to make objects added to the repo durable in the event
553+
of an unclean system shutdown. This setting currently only
554+
controls loose objects in the object store, so updates to any
555+
refs or the index may not be equally durable.
556+
+
557+
* `false` allows data to remain in file system caches according to
558+
operating system policy, whence it may be lost if the system loses power
559+
or crashes.
560+
* `true` triggers a data integrity flush for each loose object added to the
561+
object store. This is the safest setting that is likely to ensure durability
562+
across all operating systems and file systems that honor the 'fsync' system
563+
call. However, this setting comes with a significant performance cost on
564+
common hardware. Git does not currently fsync parent directories for
565+
newly-added files, so some filesystems may still allow data to be lost on
566+
system crash.
567+
* `batch` enables an experimental mode that uses interfaces available in some
568+
operating systems to write loose object data with a minimal set of FLUSH
569+
CACHE (or equivalent) commands sent to the storage controller. If the
570+
operating system interfaces are not available, this mode behaves the same as
571+
`true`. This mode is expected to be as safe as `true` on macOS for repos
572+
stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
573+
ReFS.
557574

558575
core.preloadIndex::
559576
Enable parallel index preload for operations like 'git diff'

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,8 @@ all::
405405
#
406406
# Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
407407
#
408+
# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
409+
#
408410
# Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
409411
# before 2.17) for clock_gettime and CLOCK_MONOTONIC.
410412
#
@@ -1892,6 +1894,10 @@ ifdef HAVE_CLOCK_MONOTONIC
18921894
BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
18931895
endif
18941896

1897+
ifdef HAVE_SYNC_FILE_RANGE
1898+
BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
1899+
endif
1900+
18951901
ifdef NEEDS_LIBRT
18961902
EXTLIBS += -lrt
18971903
endif

bulk-checkin.c

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,20 @@
33
*/
44
#include "cache.h"
55
#include "bulk-checkin.h"
6+
#include "lockfile.h"
67
#include "repository.h"
78
#include "csum-file.h"
89
#include "pack.h"
910
#include "strbuf.h"
11+
#include "string-list.h"
12+
#include "tmp-objdir.h"
1013
#include "packfile.h"
1114
#include "object-store.h"
1215

1316
static int bulk_checkin_plugged;
17+
static int needs_batch_fsync;
18+
19+
static struct tmp_objdir *bulk_fsync_objdir;
1420

1521
static struct bulk_checkin_state {
1622
char *pack_tmp_name;
@@ -79,6 +85,34 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
7985
reprepare_packed_git(the_repository);
8086
}
8187

88+
/*
89+
* Cleanup after batch-mode fsync_object_files.
90+
*/
91+
static void do_batch_fsync(void)
92+
{
93+
/*
94+
* Issue a full hardware flush against a temporary file to ensure
95+
* that all objects are durable before any renames occur. The code in
96+
* fsync_loose_object_bulk_checkin has already issued a writeout
97+
* request, but it has not flushed any writeback cache in the storage
98+
* hardware.
99+
*/
100+
101+
if (needs_batch_fsync) {
102+
struct strbuf temp_path = STRBUF_INIT;
103+
struct tempfile *temp;
104+
105+
strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
106+
temp = xmks_tempfile(temp_path.buf);
107+
fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
108+
delete_tempfile(&temp);
109+
strbuf_release(&temp_path);
110+
}
111+
112+
if (bulk_fsync_objdir)
113+
tmp_objdir_migrate(bulk_fsync_objdir);
114+
}
115+
82116
static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
83117
{
84118
int i;
@@ -273,6 +307,25 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
273307
return 0;
274308
}
275309

310+
void fsync_loose_object_bulk_checkin(int fd)
311+
{
312+
assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
313+
314+
/*
315+
* If we have a plugged bulk checkin, we issue a call that
316+
* cleans the filesystem page cache but avoids a hardware flush
317+
* command. Later on we will issue a single hardware flush
318+
* before as part of do_batch_fsync.
319+
*/
320+
if (bulk_checkin_plugged &&
321+
git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
322+
if (!needs_batch_fsync)
323+
needs_batch_fsync = 1;
324+
} else {
325+
fsync_or_die(fd, "loose object file");
326+
}
327+
}
328+
276329
int index_bulk_checkin(struct object_id *oid,
277330
int fd, size_t size, enum object_type type,
278331
const char *path, unsigned flags)
@@ -287,6 +340,19 @@ int index_bulk_checkin(struct object_id *oid,
287340
void plug_bulk_checkin(void)
288341
{
289342
assert(!bulk_checkin_plugged);
343+
344+
/*
345+
* A temporary object directory is used to hold the files
346+
* while they are not fsynced.
347+
*/
348+
if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH) {
349+
bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
350+
if (!bulk_fsync_objdir)
351+
die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
352+
353+
tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
354+
}
355+
290356
bulk_checkin_plugged = 1;
291357
}
292358

@@ -296,4 +362,6 @@ void unplug_bulk_checkin(void)
296362
bulk_checkin_plugged = 0;
297363
if (bulk_checkin_state.f)
298364
finish_bulk_checkin(&bulk_checkin_state);
365+
366+
do_batch_fsync();
299367
}

bulk-checkin.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66

77
#include "cache.h"
88

9+
void fsync_loose_object_bulk_checkin(int fd);
10+
911
int index_bulk_checkin(struct object_id *oid,
1012
int fd, size_t size, enum object_type type,
1113
const char *path, unsigned flags);

cache.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -993,7 +993,13 @@ void reset_shared_repository(void);
993993
extern int read_replace_refs;
994994
extern char *git_replace_ref_base;
995995

996-
extern int fsync_object_files;
996+
enum fsync_object_files_mode {
997+
FSYNC_OBJECT_FILES_OFF,
998+
FSYNC_OBJECT_FILES_ON,
999+
FSYNC_OBJECT_FILES_BATCH
1000+
};
1001+
1002+
extern enum fsync_object_files_mode fsync_object_files;
9971003
extern int use_fsync;
9981004
extern int core_preload_index;
9991005
extern int precomposed_unicode;

config.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1491,7 +1491,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
14911491
}
14921492

14931493
if (!strcmp(var, "core.fsyncobjectfiles")) {
1494-
fsync_object_files = git_config_bool(var, value);
1494+
if (value && !strcmp(value, "batch"))
1495+
fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
1496+
else if (git_config_bool(var, value))
1497+
fsync_object_files = FSYNC_OBJECT_FILES_ON;
1498+
else
1499+
fsync_object_files = FSYNC_OBJECT_FILES_OFF;
14951500
return 0;
14961501
}
14971502

config.mak.uname

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ ifeq ($(uname_S),Linux)
5757
HAVE_CLOCK_MONOTONIC = YesPlease
5858
# -lrt is needed for clock_gettime on glibc <= 2.16
5959
NEEDS_LIBRT = YesPlease
60+
HAVE_SYNC_FILE_RANGE = YesPlease
6061
HAVE_GETDELIM = YesPlease
6162
FREAD_READS_DIRECTORIES = UnfortunatelyYes
6263
BASIC_CFLAGS += -DHAVE_SYSINFO

configure.ac

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1095,6 +1095,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
10951095
[AC_MSG_RESULT([no])
10961096
HAVE_CLOCK_MONOTONIC=])
10971097
GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
1098+
1099+
#
1100+
# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
1101+
GIT_CHECK_FUNC(sync_file_range,
1102+
[HAVE_SYNC_FILE_RANGE=YesPlease],
1103+
[HAVE_SYNC_FILE_RANGE])
1104+
GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
1105+
10981106
#
10991107
# Define NO_SETITIMER if you don't have setitimer.
11001108
GIT_CHECK_FUNC(setitimer,

environment.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ const char *git_attributes_file;
4242
const char *git_hooks_path;
4343
int zlib_compression_level = Z_BEST_SPEED;
4444
int pack_compression_level = Z_DEFAULT_COMPRESSION;
45-
int fsync_object_files;
45+
enum fsync_object_files_mode fsync_object_files;
4646
int use_fsync = -1;
4747
size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
4848
size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;

git-compat-util.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1277,6 +1277,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
12771277
void BUG(const char *fmt, ...);
12781278
#endif
12791279

1280+
enum fsync_action {
1281+
FSYNC_WRITEOUT_ONLY,
1282+
FSYNC_HARDWARE_FLUSH
1283+
};
1284+
1285+
int git_fsync(int fd, enum fsync_action action);
1286+
12801287
/*
12811288
* Preserves errno, prints a message, but gives no warning for ENOENT.
12821289
* Returns 0 on success, which includes trying to unlink an object that does

0 commit comments

Comments
 (0)