Skip to content

Commit 788cef8

Browse files
committed
Merge branch 'nd/split-index'
An experiment to use two files (the base file and incremental changes relative to it) to represent the index to reduce I/O cost of rewriting a large index when only small part of the working tree changes. * nd/split-index: (32 commits) t1700: new tests for split-index mode t2104: make sure split index mode is off for the version test read-cache: force split index mode with GIT_TEST_SPLIT_INDEX read-tree: note about dropping split-index mode or index version read-tree: force split-index mode off on --index-output rev-parse: add --shared-index-path to get shared index path update-index --split-index: do not split if $GIT_DIR is read only update-index: new options to enable/disable split index mode split-index: strip pathname of on-disk replaced entries split-index: do not invalidate cache-tree at read time split-index: the reading part split-index: the writing part read-cache: mark updated entries for split index read-cache: save deleted entries in split index read-cache: mark new entries for split index read-cache: split-index mode read-cache: save index SHA-1 after reading entry.c: update cache_changed if refresh_cache is set in checkout_entry() cache-tree: mark istate->cache_changed on prime_cache_tree() cache-tree: mark istate->cache_changed on cache tree update ...
2 parents e0a064a + 3e52f70 commit 788cef8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1088
-193
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@
181181
/test-date
182182
/test-delta
183183
/test-dump-cache-tree
184+
/test-dump-split-index
184185
/test-scrap-cache-tree
185186
/test-genrandom
186187
/test-hashmap

Documentation/git-rev-parse.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,10 @@ print a message to stderr and exit with nonzero status.
245245
--show-toplevel::
246246
Show the absolute path of the top-level directory.
247247

248+
--shared-index-path::
249+
Show the path to the shared index file in split index mode, or
250+
empty if not in split-index mode.
251+
248252
Other Options
249253
~~~~~~~~~~~~~
250254

Documentation/git-update-index.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,17 @@ may not support it yet.
161161
Only meaningful with `--stdin` or `--index-info`; paths are
162162
separated with NUL character instead of LF.
163163

164+
--split-index::
165+
--no-split-index::
166+
Enable or disable split index mode. If enabled, the index is
167+
split into two files, $GIT_DIR/index and $GIT_DIR/sharedindex.<SHA-1>.
168+
Changes are accumulated in $GIT_DIR/index while the shared
169+
index file contains all index entries stays unchanged. If
170+
split-index mode is already enabled and `--split-index` is
171+
given again, all changes in $GIT_DIR/index are pushed back to
172+
the shared index file. This mode is designed for very large
173+
indexes that take a signficant amount of time to read or write.
174+
164175
\--::
165176
Do not interpret any more arguments as options.
166177

Documentation/gitrepository-layout.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,10 @@ index::
155155
The current index file for the repository. It is
156156
usually not found in a bare repository.
157157

158+
sharedindex.<SHA-1>::
159+
The shared index part, to be referenced by $GIT_DIR/index and
160+
other temporary index files. Only valid in split index mode.
161+
158162
info::
159163
Additional information about the repository is recorded
160164
in this directory.

Documentation/technical/index-format.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,9 @@ Git index format
129129
(Version 4) In version 4, the padding after the pathname does not
130130
exist.
131131

132+
Interpretation of index entries in split index mode is completely
133+
different. See below for details.
134+
132135
== Extensions
133136

134137
=== Cached tree
@@ -198,3 +201,35 @@ Git index format
198201
- At most three 160-bit object names of the entry in stages from 1 to 3
199202
(nothing is written for a missing stage).
200203

204+
=== Split index
205+
206+
In split index mode, the majority of index entries could be stored
207+
in a separate file. This extension records the changes to be made on
208+
top of that to produce the final index.
209+
210+
The signature for this extension is { 'l', 'i, 'n', 'k' }.
211+
212+
The extension consists of:
213+
214+
- 160-bit SHA-1 of the shared index file. The shared index file path
215+
is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
216+
index does not require a shared index file.
217+
218+
- An ewah-encoded delete bitmap, each bit represents an entry in the
219+
shared index. If a bit is set, its corresponding entry in the
220+
shared index will be removed from the final index. Note, because
221+
a delete operation changes index entry positions, but we do need
222+
original positions in replace phase, it's best to just mark
223+
entries for removal, then do a mass deletion after replacement.
224+
225+
- An ewah-encoded replace bitmap, each bit represents an entry in
226+
the shared index. If a bit is set, its corresponding entry in the
227+
shared index will be replaced with an entry in this index
228+
file. All replaced entries are stored in sorted order in this
229+
index. The first "1" bit in the replace bitmap corresponds to the
230+
first index entry, the second "1" bit to the second entry and so
231+
on. Replaced entries may have empty path names to save space.
232+
233+
The remaining index entries after replaced ones will be added to the
234+
final index. These added entries are also sorted by entry namme then
235+
stage.

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -552,6 +552,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
552552
TEST_PROGRAMS_NEED_X += test-date
553553
TEST_PROGRAMS_NEED_X += test-delta
554554
TEST_PROGRAMS_NEED_X += test-dump-cache-tree
555+
TEST_PROGRAMS_NEED_X += test-dump-split-index
555556
TEST_PROGRAMS_NEED_X += test-genrandom
556557
TEST_PROGRAMS_NEED_X += test-hashmap
557558
TEST_PROGRAMS_NEED_X += test-index-version
@@ -875,6 +876,7 @@ LIB_OBJS += sha1_name.o
875876
LIB_OBJS += shallow.o
876877
LIB_OBJS += sideband.o
877878
LIB_OBJS += sigchain.o
879+
LIB_OBJS += split-index.o
878880
LIB_OBJS += strbuf.o
879881
LIB_OBJS += streaming.o
880882
LIB_OBJS += string-list.o

builtin/add.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,6 @@ static int add_files(struct dir_struct *dir, int flags)
299299
int cmd_add(int argc, const char **argv, const char *prefix)
300300
{
301301
int exit_status = 0;
302-
int newfd;
303302
struct pathspec pathspec;
304303
struct dir_struct dir;
305304
int flags;
@@ -345,7 +344,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
345344
add_new_files = !take_worktree_changes && !refresh_only;
346345
require_pathspec = !take_worktree_changes;
347346

348-
newfd = hold_locked_index(&lock_file, 1);
347+
hold_locked_index(&lock_file, 1);
349348

350349
flags = ((verbose ? ADD_CACHE_VERBOSE : 0) |
351350
(show_only ? ADD_CACHE_PRETEND : 0) |
@@ -443,8 +442,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
443442

444443
finish:
445444
if (active_cache_changed) {
446-
if (write_cache(newfd, active_cache, active_nr) ||
447-
commit_locked_index(&lock_file))
445+
if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
448446
die(_("Unable to write new index file"));
449447
}
450448

builtin/apply.c

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3084,13 +3084,15 @@ static void prepare_fn_table(struct patch *patch)
30843084
}
30853085
}
30863086

3087-
static int checkout_target(struct cache_entry *ce, struct stat *st)
3087+
static int checkout_target(struct index_state *istate,
3088+
struct cache_entry *ce, struct stat *st)
30883089
{
30893090
struct checkout costate;
30903091

30913092
memset(&costate, 0, sizeof(costate));
30923093
costate.base_dir = "";
30933094
costate.refresh_cache = 1;
3095+
costate.istate = istate;
30943096
if (checkout_entry(ce, &costate, NULL) || lstat(ce->name, st))
30953097
return error(_("cannot checkout %s"), ce->name);
30963098
return 0;
@@ -3257,7 +3259,7 @@ static int load_current(struct image *image, struct patch *patch)
32573259
if (lstat(name, &st)) {
32583260
if (errno != ENOENT)
32593261
return error(_("%s: %s"), name, strerror(errno));
3260-
if (checkout_target(ce, &st))
3262+
if (checkout_target(&the_index, ce, &st))
32613263
return -1;
32623264
}
32633265
if (verify_index_match(ce, &st))
@@ -3411,7 +3413,7 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s
34113413
}
34123414
*ce = active_cache[pos];
34133415
if (stat_ret < 0) {
3414-
if (checkout_target(*ce, st))
3416+
if (checkout_target(&the_index, *ce, st))
34153417
return -1;
34163418
}
34173419
if (!cached && verify_index_match(*ce, st))
@@ -3644,7 +3646,7 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
36443646
{
36453647
struct patch *patch;
36463648
struct index_state result = { NULL };
3647-
int fd;
3649+
static struct lock_file lock;
36483650

36493651
/* Once we start supporting the reverse patch, it may be
36503652
* worth showing the new sha1 prefix, but until then...
@@ -3682,8 +3684,8 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
36823684
die ("Could not add %s to temporary index", name);
36833685
}
36843686

3685-
fd = open(filename, O_WRONLY | O_CREAT, 0666);
3686-
if (fd < 0 || write_index(&result, fd) || close(fd))
3687+
hold_lock_file_for_update(&lock, filename, LOCK_DIE_ON_ERROR);
3688+
if (write_locked_index(&result, &lock, COMMIT_LOCK))
36873689
die ("Could not write temporary index to %s", filename);
36883690

36893691
discard_index(&result);
@@ -4502,8 +4504,7 @@ int cmd_apply(int argc, const char **argv, const char *prefix_)
45024504
}
45034505

45044506
if (update_index) {
4505-
if (write_cache(newfd, active_cache, active_nr) ||
4506-
commit_locked_index(&lock_file))
4507+
if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
45074508
die(_("Unable to write new index file"));
45084509
}
45094510

builtin/blame.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2389,7 +2389,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
23892389
* right now, but someday we might optimize diff-index --cached
23902390
* with cache-tree information.
23912391
*/
2392-
cache_tree_invalidate_path(active_cache_tree, path);
2392+
cache_tree_invalidate_path(&the_index, path);
23932393

23942394
return commit;
23952395
}

builtin/checkout-index.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@ static int option_parse_u(const struct option *opt,
135135
int *newfd = opt->value;
136136

137137
state.refresh_cache = 1;
138+
state.istate = &the_index;
138139
if (*newfd < 0)
139140
*newfd = hold_locked_index(&lock_file, 1);
140141
return 0;
@@ -279,8 +280,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
279280
checkout_all(prefix, prefix_length);
280281

281282
if (0 <= newfd &&
282-
(write_cache(newfd, active_cache, active_nr) ||
283-
commit_locked_index(&lock_file)))
283+
write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
284284
die("Unable to write new index file");
285285
return 0;
286286
}

0 commit comments

Comments
 (0)