@@ -11,6 +11,7 @@ SYNOPSIS
1111[verse]
1212$GIT_DIR/objects/pack/pack-*.{pack,idx}
1313$GIT_DIR/objects/pack/pack-*.rev
14+ $GIT_DIR/objects/pack/pack-*.mtimes
1415$GIT_DIR/objects/pack/multi-pack-index
1516
1617DESCRIPTION
@@ -507,6 +508,131 @@ packs arranged in MIDX order (with the preferred pack coming first).
507508The MIDX's reverse index is stored in the optional 'RIDX' chunk within
508509the MIDX itself.
509510
511+ == cruft packs
512+
513+ The cruft packs feature offer an alternative to Git's traditional mechanism of
514+ removing unreachable objects. This document provides an overview of Git's
515+ pruning mechanism, and how a cruft pack can be used instead to accomplish the
516+ same.
517+
518+ === Background
519+
520+ To remove unreachable objects from your repository, Git offers `git repack -Ad`
521+ (see linkgit:git-repack[1]). Quoting from the documentation:
522+
523+ ----
524+ [...] unreachable objects in a previous pack become loose, unpacked objects,
525+ instead of being left in the old pack. [...] loose unreachable objects will be
526+ pruned according to normal expiry rules with the next 'git gc' invocation.
527+ ----
528+
529+ Unreachable objects aren't removed immediately, since doing so could race with
530+ an incoming push which may reference an object which is about to be deleted.
531+ Instead, those unreachable objects are stored as loose objects and stay that way
532+ until they are older than the expiration window, at which point they are removed
533+ by linkgit:git-prune[1].
534+
535+ Git must store these unreachable objects loose in order to keep track of their
536+ per-object mtimes. If these unreachable objects were written into one big pack,
537+ then either freshening that pack (because an object contained within it was
538+ re-written) or creating a new pack of unreachable objects would cause the pack's
539+ mtime to get updated, and the objects within it would never leave the expiration
540+ window. Instead, objects are stored loose in order to keep track of the
541+ individual object mtimes and avoid a situation where all cruft objects are
542+ freshened at once.
543+
544+ This can lead to undesirable situations when a repository contains many
545+ unreachable objects which have not yet left the grace period. Having large
546+ directories in the shards of `.git/objects` can lead to decreased performance in
547+ the repository. But given enough unreachable objects, this can lead to inode
548+ starvation and degrade the performance of the whole system. Since we
549+ can never pack those objects, these repositories often take up a large amount of
550+ disk space, since we can only zlib compress them, but not store them in delta
551+ chains.
552+
553+ === Cruft packs
554+
555+ A cruft pack eliminates the need for storing unreachable objects in a loose
556+ state by including the per-object mtimes in a separate file alongside a single
557+ pack containing all loose objects.
558+
559+ A cruft pack is written by `git repack --cruft` when generating a new pack.
560+ linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
561+ is a classic all-into-one repack, meaning that everything in the resulting pack is
562+ reachable, and everything else is unreachable. Once written, the `--cruft`
563+ option instructs `git repack` to generate another pack containing only objects
564+ not packed in the previous step (which equates to packing all unreachable
565+ objects together). This progresses as follows:
566+
567+ 1. Enumerate every object, marking any object which is (a) not contained in a
568+ kept-pack, and (b) whose mtime is within the grace period as a traversal
569+ tip.
570+
571+ 2. Perform a reachability traversal based on the tips gathered in the previous
572+ step, adding every object along the way to the pack.
573+
574+ 3. Write the pack out, along with a `.mtimes` file that records the per-object
575+ timestamps.
576+
577+ This mode is invoked internally by linkgit:git-repack[1] when instructed to
578+ write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
579+ of packs which will not be deleted by the repack; in other words, they contain
580+ all of the repository's reachable objects.
581+
582+ When a repository already has a cruft pack, `git repack --cruft` typically only
583+ adds objects to it. An exception to this is when `git repack` is given the
584+ `--cruft-expiration` option, which allows the generated cruft pack to omit
585+ expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
586+ later on.
587+
588+ It is linkgit:git-gc[1] that is typically responsible for removing expired
589+ unreachable objects.
590+
591+ === Caution for mixed-version environments
592+
593+ Repositories that have cruft packs in them will continue to work with any older
594+ version of Git. Note, however, that previous versions of Git which do not
595+ understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
596+ all of the objects in it. In other words, do not expect older (pre-cruft pack)
597+ versions of Git to interpret or even read the contents of the `.mtimes` file.
598+
599+ Note that having mixed versions of Git GC-ing the same repository can lead to
600+ unreachable objects never being completely pruned. This can happen under the
601+ following circumstances:
602+
603+ - An older version of Git running GC explodes the contents of an existing
604+ cruft pack loose, using the cruft pack's mtime.
605+ - A newer version running GC collects those loose objects into a cruft pack,
606+ where the .mtime file reflects the loose object's actual mtimes, but the
607+ cruft pack mtime is "now".
608+
609+ Repeating this process will lead to unreachable objects not getting pruned as a
610+ result of repeatedly resetting the objects' mtimes to the present time.
611+
612+ If you are GC-ing repositories in a mixed version environment, consider omitting
613+ the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
614+ leaving the `gc.cruftPacks` configuration unset until all writers understand
615+ cruft packs.
616+
617+ === Alternatives
618+
619+ Notable alternatives to this design include:
620+
621+ - The location of the per-object mtime data, and
622+ - Storing unreachable objects in multiple cruft packs.
623+
624+ On the location of mtime data, a new auxiliary file tied to the pack was chosen
625+ to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
626+ support for optional chunks of data, it may make sense to consolidate the
627+ `.mtimes` format into the `.idx` itself.
628+
629+ Storing unreachable objects among multiple cruft packs (e.g., creating a new
630+ cruft pack during each repacking operation including only unreachable objects
631+ which aren't already stored in an earlier cruft pack) is significantly more
632+ complicated to construct, and so aren't pursued here. The obvious drawback to
633+ the current implementation is that the entire cruft pack must be re-written from
634+ scratch.
635+
510636GIT
511637---
512638Part of the linkgit:git[1] suite
0 commit comments