Skip to content

Commit a50036d

Browse files
committed
Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt
2 parents 37d4ae5 + a613164 commit a50036d

32 files changed

+1853
-102
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ TECH_DOCS += MyFirstObjectWalk
9595
TECH_DOCS += SubmittingPatches
9696
TECH_DOCS += ToolsForGit
9797
TECH_DOCS += technical/bundle-format
98+
TECH_DOCS += technical/cruft-packs
9899
TECH_DOCS += technical/hash-function-transition
99100
TECH_DOCS += technical/http-protocol
100101
TECH_DOCS += technical/index-format

Documentation/config/gc.txt

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -81,14 +81,21 @@ gc.packRefs::
8181
to enable it within all non-bare repos or it can be set to a
8282
boolean value. The default is `true`.
8383

84+
gc.cruftPacks::
85+
Store unreachable objects in a cruft pack (see
86+
linkgit:git-repack[1]) instead of as loose objects. The default
87+
is `false`.
88+
8489
gc.pruneExpire::
85-
When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'.
86-
Override the grace period with this config variable. The value
87-
"now" may be used to disable this grace period and always prune
88-
unreachable objects immediately, or "never" may be used to
89-
suppress pruning. This feature helps prevent corruption when
90-
'git gc' runs concurrently with another process writing to the
91-
repository; see the "NOTES" section of linkgit:git-gc[1].
90+
When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'
91+
(and 'repack --cruft --cruft-expiration 2.weeks.ago' if using
92+
cruft packs via `gc.cruftPacks` or `--cruft`). Override the
93+
grace period with this config variable. The value "now" may be
94+
used to disable this grace period and always prune unreachable
95+
objects immediately, or "never" may be used to suppress pruning.
96+
This feature helps prevent corruption when 'git gc' runs
97+
concurrently with another process writing to the repository; see
98+
the "NOTES" section of linkgit:git-gc[1].
9299

93100
gc.worktreePruneExpire::
94101
When 'git gc' is run, it calls

Documentation/config/repack.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,12 @@ repack.updateServerInfo::
3030
If set to false, linkgit:git-repack[1] will not run
3131
linkgit:git-update-server-info[1]. Defaults to true. Can be overridden
3232
when true by the `-n` option of linkgit:git-repack[1].
33+
34+
repack.cruftWindow::
35+
repack.cruftWindowMemory::
36+
repack.cruftDepth::
37+
repack.cruftThreads::
38+
Parameters used by linkgit:git-pack-objects[1] when generating
39+
a cruft pack and the respective parameters are not given over
40+
the command line. See similarly named `pack.*` configuration
41+
variables for defaults and meaning.

Documentation/git-gc.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,11 @@ other housekeeping tasks (e.g. rerere, working trees, reflog...) will
5454
be performed as well.
5555

5656

57+
--cruft::
58+
When expiring unreachable objects, pack them separately into a
59+
cruft pack instead of storing the loose objects as loose
60+
objects.
61+
5762
--prune=<date>::
5863
Prune loose objects older than date (default is 2 weeks ago,
5964
overridable by the config variable `gc.pruneExpire`).

Documentation/git-pack-objects.txt

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ SYNOPSIS
1313
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
1414
[--local] [--incremental] [--window=<n>] [--depth=<n>]
1515
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
16+
[--cruft] [--cruft-expiration=<time>]
1617
[--stdout [--filter=<filter-spec>] | <base-name>]
1718
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
1819

@@ -95,6 +96,35 @@ base-name::
9596
Incompatible with `--revs`, or options that imply `--revs` (such as
9697
`--all`), with the exception of `--unpacked`, which is compatible.
9798

99+
--cruft::
100+
Packs unreachable objects into a separate "cruft" pack, denoted
101+
by the existence of a `.mtimes` file. Typically used by `git
102+
repack --cruft`. Callers provide a list of pack names and
103+
indicate which packs will remain in the repository, along with
104+
which packs will be deleted (indicated by the `-` prefix). The
105+
contents of the cruft pack are all objects not contained in the
106+
surviving packs which have not exceeded the grace period (see
107+
`--cruft-expiration` below), or which have exceeded the grace
108+
period, but are reachable from an other object which hasn't.
109+
+
110+
When the input lists a pack containing all reachable objects (and lists
111+
all other packs as pending deletion), the corresponding cruft pack will
112+
contain all unreachable objects (with mtime newer than the
113+
`--cruft-expiration`) along with any unreachable objects whose mtime is
114+
older than the `--cruft-expiration`, but are reachable from an
115+
unreachable object whose mtime is newer than the `--cruft-expiration`).
116+
+
117+
Incompatible with `--unpack-unreachable`, `--keep-unreachable`,
118+
`--pack-loose-unreachable`, `--stdin-packs`, as well as any other
119+
options which imply `--revs`. Also incompatible with `--max-pack-size`;
120+
when this option is set, the maximum pack size is not inferred from
121+
`pack.packSizeLimit`.
122+
123+
--cruft-expiration=<approxidate>::
124+
If specified, objects are eliminated from the cruft pack if they
125+
have an mtime older than `<approxidate>`. If unspecified (and
126+
given `--cruft`), then no objects are eliminated.
127+
98128
--window=<n>::
99129
--depth=<n>::
100130
These two options affect how the objects contained in

Documentation/git-repack.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,17 @@ to the new separate pack will be written.
6363
Also run 'git prune-packed' to remove redundant
6464
loose object files.
6565

66+
--cruft::
67+
Same as `-a`, unless `-d` is used. Then any unreachable objects
68+
are packed into a separate cruft pack. Unreachable objects can
69+
be pruned using the normal expiry rules with the next `git gc`
70+
invocation (see linkgit:git-gc[1]). Incompatible with `-k`.
71+
72+
--cruft-expiration=<approxidate>::
73+
Expire unreachable objects older than `<approxidate>`
74+
immediately instead of waiting for the next `git gc` invocation.
75+
Only useful with `--cruft -d`.
76+
6677
-l::
6778
Pass the `--local` option to 'git pack-objects'. See
6879
linkgit:git-pack-objects[1].
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
= Cruft packs
2+
3+
The cruft packs feature offer an alternative to Git's traditional mechanism of
4+
removing unreachable objects. This document provides an overview of Git's
5+
pruning mechanism, and how a cruft pack can be used instead to accomplish the
6+
same.
7+
8+
== Background
9+
10+
To remove unreachable objects from your repository, Git offers `git repack -Ad`
11+
(see linkgit:git-repack[1]). Quoting from the documentation:
12+
13+
[quote]
14+
[...] unreachable objects in a previous pack become loose, unpacked objects,
15+
instead of being left in the old pack. [...] loose unreachable objects will be
16+
pruned according to normal expiry rules with the next 'git gc' invocation.
17+
18+
Unreachable objects aren't removed immediately, since doing so could race with
19+
an incoming push which may reference an object which is about to be deleted.
20+
Instead, those unreachable objects are stored as loose objects and stay that way
21+
until they are older than the expiration window, at which point they are removed
22+
by linkgit:git-prune[1].
23+
24+
Git must store these unreachable objects loose in order to keep track of their
25+
per-object mtimes. If these unreachable objects were written into one big pack,
26+
then either freshening that pack (because an object contained within it was
27+
re-written) or creating a new pack of unreachable objects would cause the pack's
28+
mtime to get updated, and the objects within it would never leave the expiration
29+
window. Instead, objects are stored loose in order to keep track of the
30+
individual object mtimes and avoid a situation where all cruft objects are
31+
freshened at once.
32+
33+
This can lead to undesirable situations when a repository contains many
34+
unreachable objects which have not yet left the grace period. Having large
35+
directories in the shards of `.git/objects` can lead to decreased performance in
36+
the repository. But given enough unreachable objects, this can lead to inode
37+
starvation and degrade the performance of the whole system. Since we
38+
can never pack those objects, these repositories often take up a large amount of
39+
disk space, since we can only zlib compress them, but not store them in delta
40+
chains.
41+
42+
== Cruft packs
43+
44+
A cruft pack eliminates the need for storing unreachable objects in a loose
45+
state by including the per-object mtimes in a separate file alongside a single
46+
pack containing all loose objects.
47+
48+
A cruft pack is written by `git repack --cruft` when generating a new pack.
49+
linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
50+
is a classic all-into-one repack, meaning that everything in the resulting pack is
51+
reachable, and everything else is unreachable. Once written, the `--cruft`
52+
option instructs `git repack` to generate another pack containing only objects
53+
not packed in the previous step (which equates to packing all unreachable
54+
objects together). This progresses as follows:
55+
56+
1. Enumerate every object, marking any object which is (a) not contained in a
57+
kept-pack, and (b) whose mtime is within the grace period as a traversal
58+
tip.
59+
60+
2. Perform a reachability traversal based on the tips gathered in the previous
61+
step, adding every object along the way to the pack.
62+
63+
3. Write the pack out, along with a `.mtimes` file that records the per-object
64+
timestamps.
65+
66+
This mode is invoked internally by linkgit:git-repack[1] when instructed to
67+
write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
68+
of packs which will not be deleted by the repack; in other words, they contain
69+
all of the repository's reachable objects.
70+
71+
When a repository already has a cruft pack, `git repack --cruft` typically only
72+
adds objects to it. An exception to this is when `git repack` is given the
73+
`--cruft-expiration` option, which allows the generated cruft pack to omit
74+
expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
75+
later on.
76+
77+
It is linkgit:git-gc[1] that is typically responsible for removing expired
78+
unreachable objects.
79+
80+
== Caution for mixed-version environments
81+
82+
Repositories that have cruft packs in them will continue to work with any older
83+
version of Git. Note, however, that previous versions of Git which do not
84+
understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
85+
all of the objects in it. In other words, do not expect older (pre-cruft pack)
86+
versions of Git to interpret or even read the contents of the `.mtimes` file.
87+
88+
Note that having mixed versions of Git GC-ing the same repository can lead to
89+
unreachable objects never being completely pruned. This can happen under the
90+
following circumstances:
91+
92+
- An older version of Git running GC explodes the contents of an existing
93+
cruft pack loose, using the cruft pack's mtime.
94+
- A newer version running GC collects those loose objects into a cruft pack,
95+
where the .mtime file reflects the loose object's actual mtimes, but the
96+
cruft pack mtime is "now".
97+
98+
Repeating this process will lead to unreachable objects not getting pruned as a
99+
result of repeatedly resetting the objects' mtimes to the present time.
100+
101+
If you are GC-ing repositories in a mixed version environment, consider omitting
102+
the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
103+
leaving the `gc.cruftPacks` configuration unset until all writers understand
104+
cruft packs.
105+
106+
== Alternatives
107+
108+
Notable alternatives to this design include:
109+
110+
- The location of the per-object mtime data, and
111+
- Storing unreachable objects in multiple cruft packs.
112+
113+
On the location of mtime data, a new auxiliary file tied to the pack was chosen
114+
to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
115+
support for optional chunks of data, it may make sense to consolidate the
116+
`.mtimes` format into the `.idx` itself.
117+
118+
Storing unreachable objects among multiple cruft packs (e.g., creating a new
119+
cruft pack during each repacking operation including only unreachable objects
120+
which aren't already stored in an earlier cruft pack) is significantly more
121+
complicated to construct, and so aren't pursued here. The obvious drawback to
122+
the current implementation is that the entire cruft pack must be re-written from
123+
scratch.

Documentation/technical/pack-format.txt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,25 @@ Pack file entry: <+
294294

295295
All 4-byte numbers are in network order.
296296

297+
== pack-*.mtimes files have the format:
298+
299+
All 4-byte numbers are in network byte order.
300+
301+
- A 4-byte magic number '0x4d544d45' ('MTME').
302+
303+
- A 4-byte version identifier (= 1).
304+
305+
- A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
306+
307+
- A table of 4-byte unsigned integers. The ith value is the
308+
modification time (mtime) of the ith object in the corresponding
309+
pack by lexicographic (index) order. The mtimes count standard
310+
epoch seconds.
311+
312+
- A trailer, containing a checksum of the corresponding packfile,
313+
and a checksum of all of the above (each having length according
314+
to the specified hash function).
315+
297316
== multi-pack-index (MIDX) files have the following format:
298317

299318
The multi-pack-index files refer to multiple pack-files and loose objects.

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -740,6 +740,7 @@ TEST_BUILTINS_OBJS += test-oid-array.o
740740
TEST_BUILTINS_OBJS += test-oidmap.o
741741
TEST_BUILTINS_OBJS += test-oidtree.o
742742
TEST_BUILTINS_OBJS += test-online-cpus.o
743+
TEST_BUILTINS_OBJS += test-pack-mtimes.o
743744
TEST_BUILTINS_OBJS += test-parse-options.o
744745
TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
745746
TEST_BUILTINS_OBJS += test-partial-clone.o
@@ -996,6 +997,7 @@ LIB_OBJS += oidtree.o
996997
LIB_OBJS += pack-bitmap-write.o
997998
LIB_OBJS += pack-bitmap.o
998999
LIB_OBJS += pack-check.o
1000+
LIB_OBJS += pack-mtimes.o
9991001
LIB_OBJS += pack-objects.o
10001002
LIB_OBJS += pack-revindex.o
10011003
LIB_OBJS += pack-write.o

builtin/gc.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ static const char * const builtin_gc_usage[] = {
4242

4343
static int pack_refs = 1;
4444
static int prune_reflogs = 1;
45+
static int cruft_packs = 0;
4546
static int aggressive_depth = 50;
4647
static int aggressive_window = 250;
4748
static int gc_auto_threshold = 6700;
@@ -152,6 +153,7 @@ static void gc_config(void)
152153
git_config_get_int("gc.auto", &gc_auto_threshold);
153154
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
154155
git_config_get_bool("gc.autodetach", &detach_auto);
156+
git_config_get_bool("gc.cruftpacks", &cruft_packs);
155157
git_config_get_expiry("gc.pruneexpire", &prune_expire);
156158
git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
157159
git_config_get_expiry("gc.logexpiry", &gc_log_expire);
@@ -331,7 +333,11 @@ static void add_repack_all_option(struct string_list *keep_pack)
331333
{
332334
if (prune_expire && !strcmp(prune_expire, "now"))
333335
strvec_push(&repack, "-a");
334-
else {
336+
else if (cruft_packs) {
337+
strvec_push(&repack, "--cruft");
338+
if (prune_expire)
339+
strvec_pushf(&repack, "--cruft-expiration=%s", prune_expire);
340+
} else {
335341
strvec_push(&repack, "-A");
336342
if (prune_expire)
337343
strvec_pushf(&repack, "--unpack-unreachable=%s", prune_expire);
@@ -551,6 +557,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
551557
{ OPTION_STRING, 0, "prune", &prune_expire, N_("date"),
552558
N_("prune unreferenced objects"),
553559
PARSE_OPT_OPTARG, NULL, (intptr_t)prune_expire },
560+
OPT_BOOL(0, "cruft", &cruft_packs, N_("pack unreferenced objects separately")),
554561
OPT_BOOL(0, "aggressive", &aggressive, N_("be more thorough (increased runtime)")),
555562
OPT_BOOL_F(0, "auto", &auto_gc, N_("enable auto-gc mode"),
556563
PARSE_OPT_NOCOMPLETE),
@@ -670,6 +677,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
670677
die(FAILED_RUN, repack.v[0]);
671678

672679
if (prune_expire) {
680+
/* run `git prune` even if using cruft packs */
673681
strvec_push(&prune, prune_expire);
674682
if (quiet)
675683
strvec_push(&prune, "--no-progress");

0 commit comments

Comments
 (0)