Skip to content

Commit cffda32

Browse files
neerajsi-msftdscho
authored andcommitted
core.fsyncmethod: batched disk flushes for loose-objects
When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin odb-transaction functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1a. Create the object in a tmp-objdir. 2a. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1b. Issue an fsync against a dummy file to flush the log and hardware writeback cache, which should by now have seen the tmp-objdir writes. 2b. Rename all of the tmp-objdir files to their final names. 3b. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This sequence also ensures that no object files appear in the main object store unless they are fsync-durable. Batch mode is only enabled if core.fsync includes loose-objects. If the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does not include loose-objects, we will use file-by-file fsyncing. In step (1a) of the sequence, the tmp-objdir is created lazily to avoid work if no loose objects are ever added to the ODB. We use a tmp-objdir to maintain the invariant that no loose-objects are visible in the main ODB unless they are properly fsync-durable. This is important since future ODB operations that try to create an object with specific contents will silently drop the new data if an object with the target hash exists without checking that the loose-object contents match the hash. Only a full git-fsck would restore the ODB to a functional state where dataloss doesn't occur. In step (1b) of the sequence, we issue a fsync against a dummy file created specifically for the purpose. This method has a little higher cost than using one of the input object files, but makes adding new callers of this mechanism easier, since we don't need to figure out which object file is "last" or risk sharing violations by caching the fd of the last object file. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing | Linux | Mac | Windows --------------------|-------|-------|-------- disabled | 0.06 | 0.35 | 0.61 fsync | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 17740a1 commit cffda32

File tree

6 files changed

+97
-2
lines changed

6 files changed

+97
-2
lines changed

Documentation/config/core.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -628,6 +628,14 @@ core.fsyncMethod::
628628
* `writeout-only` issues pagecache writeback requests, but depending on the
629629
filesystem and storage hardware, data added to the repository may not be
630630
durable in the event of a system crash. This is the default mode on macOS.
631+
* `batch` enables a mode that uses writeout-only flushes to stage multiple
632+
updates in the disk writeback cache and then does a single full fsync of
633+
a dummy file to trigger the disk cache flush at the end of the operation.
634+
+
635+
Currently `batch` mode only applies to loose-object files. Other repository
636+
data is made durable as if `fsync` was specified. This mode is expected to
637+
be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems
638+
and on Windows for repos stored on NTFS or ReFS filesystems.
631639

632640
core.fsyncObjectFiles::
633641
This boolean will enable 'fsync()' when writing object files.

bulk-checkin.c

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,20 @@
33
*/
44
#include "cache.h"
55
#include "bulk-checkin.h"
6+
#include "lockfile.h"
67
#include "repository.h"
78
#include "csum-file.h"
89
#include "pack.h"
910
#include "strbuf.h"
11+
#include "string-list.h"
12+
#include "tmp-objdir.h"
1013
#include "packfile.h"
1114
#include "object-store.h"
1215

1316
static int odb_transaction_nesting;
1417

18+
static struct tmp_objdir *bulk_fsync_objdir;
19+
1520
static struct bulk_checkin_state {
1621
char *pack_tmp_name;
1722
struct hashfile *f;
@@ -80,6 +85,40 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
8085
reprepare_packed_git(the_repository);
8186
}
8287

88+
/*
89+
* Cleanup after batch-mode fsync_object_files.
90+
*/
91+
static void do_batch_fsync(void)
92+
{
93+
struct strbuf temp_path = STRBUF_INIT;
94+
struct tempfile *temp;
95+
96+
if (!bulk_fsync_objdir)
97+
return;
98+
99+
/*
100+
* Issue a full hardware flush against a temporary file to ensure
101+
* that all objects are durable before any renames occur. The code in
102+
* fsync_loose_object_bulk_checkin has already issued a writeout
103+
* request, but it has not flushed any writeback cache in the storage
104+
* hardware or any filesystem logs. This fsync call acts as a barrier
105+
* to ensure that the data in each new object file is durable before
106+
* the final name is visible.
107+
*/
108+
strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
109+
temp = xmks_tempfile(temp_path.buf);
110+
fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
111+
delete_tempfile(&temp);
112+
strbuf_release(&temp_path);
113+
114+
/*
115+
* Make the object files visible in the primary ODB after their data is
116+
* fully durable.
117+
*/
118+
tmp_objdir_migrate(bulk_fsync_objdir);
119+
bulk_fsync_objdir = NULL;
120+
}
121+
83122
static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
84123
{
85124
int i;
@@ -274,6 +313,36 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
274313
return 0;
275314
}
276315

316+
void prepare_loose_object_bulk_checkin(void)
317+
{
318+
/*
319+
* We lazily create the temporary object directory
320+
* the first time an object might be added, since
321+
* callers may not know whether any objects will be
322+
* added at the time they call begin_odb_transaction.
323+
*/
324+
if (!odb_transaction_nesting || bulk_fsync_objdir)
325+
return;
326+
327+
bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
328+
if (bulk_fsync_objdir)
329+
tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
330+
}
331+
332+
void fsync_loose_object_bulk_checkin(int fd, const char *filename)
333+
{
334+
/*
335+
* If we have an active ODB transaction, we issue a call that
336+
* cleans the filesystem page cache but avoids a hardware flush
337+
* command. Later on we will issue a single hardware flush
338+
* before as part of do_batch_fsync.
339+
*/
340+
if (!bulk_fsync_objdir ||
341+
git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) {
342+
fsync_or_die(fd, filename);
343+
}
344+
}
345+
277346
int index_bulk_checkin(struct object_id *oid,
278347
int fd, size_t size, enum object_type type,
279348
const char *path, unsigned flags)
@@ -301,4 +370,6 @@ void end_odb_transaction(void)
301370

302371
if (bulk_checkin_state.f)
303372
finish_bulk_checkin(&bulk_checkin_state);
373+
374+
do_batch_fsync();
304375
}

bulk-checkin.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66

77
#include "cache.h"
88

9+
void prepare_loose_object_bulk_checkin(void);
10+
void fsync_loose_object_bulk_checkin(int fd, const char *filename);
11+
912
int index_bulk_checkin(struct object_id *oid,
1013
int fd, size_t size, enum object_type type,
1114
const char *path, unsigned flags);

cache.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1040,7 +1040,8 @@ extern int use_fsync;
10401040

10411041
enum fsync_method {
10421042
FSYNC_METHOD_FSYNC,
1043-
FSYNC_METHOD_WRITEOUT_ONLY
1043+
FSYNC_METHOD_WRITEOUT_ONLY,
1044+
FSYNC_METHOD_BATCH,
10441045
};
10451046

10461047
extern enum fsync_method fsync_method;
@@ -1766,6 +1767,11 @@ void fsync_or_die(int fd, const char *);
17661767
int fsync_component(enum fsync_component component, int fd);
17671768
void fsync_component_or_die(enum fsync_component component, int fd, const char *msg);
17681769

1770+
static inline int batch_fsync_enabled(enum fsync_component component)
1771+
{
1772+
return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH);
1773+
}
1774+
17691775
ssize_t read_in_full(int fd, void *buf, size_t count);
17701776
ssize_t write_in_full(int fd, const void *buf, size_t count);
17711777
ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);

config.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1688,6 +1688,8 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
16881688
fsync_method = FSYNC_METHOD_FSYNC;
16891689
else if (!strcmp(value, "writeout-only"))
16901690
fsync_method = FSYNC_METHOD_WRITEOUT_ONLY;
1691+
else if (!strcmp(value, "batch"))
1692+
fsync_method = FSYNC_METHOD_BATCH;
16911693
else
16921694
warning(_("ignoring unknown core.fsyncMethod value '%s'"), value);
16931695

object-file.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1893,7 +1893,9 @@ static void close_loose_object(int fd, const char *filename)
18931893
if (the_repository->objects->odb->will_destroy)
18941894
goto out;
18951895

1896-
if (fsync_object_files > 0)
1896+
if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
1897+
fsync_loose_object_bulk_checkin(fd, filename);
1898+
else if (fsync_object_files > 0)
18971899
fsync_or_die(fd, filename);
18981900
else
18991901
fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd,
@@ -1961,6 +1963,9 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
19611963
static struct strbuf tmp_file = STRBUF_INIT;
19621964
static struct strbuf filename = STRBUF_INIT;
19631965

1966+
if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
1967+
prepare_loose_object_bulk_checkin();
1968+
19641969
loose_object_path(the_repository, &filename, oid);
19651970

19661971
fd = create_tmpfile(&tmp_file, filename.buf);

0 commit comments

Comments
 (0)