@@ -11,6 +11,7 @@ SYNOPSIS
11
11
[verse]
12
12
$GIT_DIR/objects/pack/pack-*.{pack,idx}
13
13
$GIT_DIR/objects/pack/pack-*.rev
14
+ $GIT_DIR/objects/pack/pack-*.mtimes
14
15
$GIT_DIR/objects/pack/multi-pack-index
15
16
16
17
DESCRIPTION
@@ -507,6 +508,131 @@ packs arranged in MIDX order (with the preferred pack coming first).
507
508
The MIDX's reverse index is stored in the optional 'RIDX' chunk within
508
509
the MIDX itself.
509
510
511
+ == cruft packs
512
+
513
+ The cruft packs feature offer an alternative to Git's traditional mechanism of
514
+ removing unreachable objects. This document provides an overview of Git's
515
+ pruning mechanism, and how a cruft pack can be used instead to accomplish the
516
+ same.
517
+
518
+ === Background
519
+
520
+ To remove unreachable objects from your repository, Git offers `git repack -Ad`
521
+ (see linkgit:git-repack[1]). Quoting from the documentation:
522
+
523
+ ----
524
+ [...] unreachable objects in a previous pack become loose, unpacked objects,
525
+ instead of being left in the old pack. [...] loose unreachable objects will be
526
+ pruned according to normal expiry rules with the next 'git gc' invocation.
527
+ ----
528
+
529
+ Unreachable objects aren't removed immediately, since doing so could race with
530
+ an incoming push which may reference an object which is about to be deleted.
531
+ Instead, those unreachable objects are stored as loose objects and stay that way
532
+ until they are older than the expiration window, at which point they are removed
533
+ by linkgit:git-prune[1].
534
+
535
+ Git must store these unreachable objects loose in order to keep track of their
536
+ per-object mtimes. If these unreachable objects were written into one big pack,
537
+ then either freshening that pack (because an object contained within it was
538
+ re-written) or creating a new pack of unreachable objects would cause the pack's
539
+ mtime to get updated, and the objects within it would never leave the expiration
540
+ window. Instead, objects are stored loose in order to keep track of the
541
+ individual object mtimes and avoid a situation where all cruft objects are
542
+ freshened at once.
543
+
544
+ This can lead to undesirable situations when a repository contains many
545
+ unreachable objects which have not yet left the grace period. Having large
546
+ directories in the shards of `.git/objects` can lead to decreased performance in
547
+ the repository. But given enough unreachable objects, this can lead to inode
548
+ starvation and degrade the performance of the whole system. Since we
549
+ can never pack those objects, these repositories often take up a large amount of
550
+ disk space, since we can only zlib compress them, but not store them in delta
551
+ chains.
552
+
553
+ === Cruft packs
554
+
555
+ A cruft pack eliminates the need for storing unreachable objects in a loose
556
+ state by including the per-object mtimes in a separate file alongside a single
557
+ pack containing all loose objects.
558
+
559
+ A cruft pack is written by `git repack --cruft` when generating a new pack.
560
+ linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
561
+ is a classic all-into-one repack, meaning that everything in the resulting pack is
562
+ reachable, and everything else is unreachable. Once written, the `--cruft`
563
+ option instructs `git repack` to generate another pack containing only objects
564
+ not packed in the previous step (which equates to packing all unreachable
565
+ objects together). This progresses as follows:
566
+
567
+ 1. Enumerate every object, marking any object which is (a) not contained in a
568
+ kept-pack, and (b) whose mtime is within the grace period as a traversal
569
+ tip.
570
+
571
+ 2. Perform a reachability traversal based on the tips gathered in the previous
572
+ step, adding every object along the way to the pack.
573
+
574
+ 3. Write the pack out, along with a `.mtimes` file that records the per-object
575
+ timestamps.
576
+
577
+ This mode is invoked internally by linkgit:git-repack[1] when instructed to
578
+ write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
579
+ of packs which will not be deleted by the repack; in other words, they contain
580
+ all of the repository's reachable objects.
581
+
582
+ When a repository already has a cruft pack, `git repack --cruft` typically only
583
+ adds objects to it. An exception to this is when `git repack` is given the
584
+ `--cruft-expiration` option, which allows the generated cruft pack to omit
585
+ expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
586
+ later on.
587
+
588
+ It is linkgit:git-gc[1] that is typically responsible for removing expired
589
+ unreachable objects.
590
+
591
+ === Caution for mixed-version environments
592
+
593
+ Repositories that have cruft packs in them will continue to work with any older
594
+ version of Git. Note, however, that previous versions of Git which do not
595
+ understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
596
+ all of the objects in it. In other words, do not expect older (pre-cruft pack)
597
+ versions of Git to interpret or even read the contents of the `.mtimes` file.
598
+
599
+ Note that having mixed versions of Git GC-ing the same repository can lead to
600
+ unreachable objects never being completely pruned. This can happen under the
601
+ following circumstances:
602
+
603
+ - An older version of Git running GC explodes the contents of an existing
604
+ cruft pack loose, using the cruft pack's mtime.
605
+ - A newer version running GC collects those loose objects into a cruft pack,
606
+ where the .mtime file reflects the loose object's actual mtimes, but the
607
+ cruft pack mtime is "now".
608
+
609
+ Repeating this process will lead to unreachable objects not getting pruned as a
610
+ result of repeatedly resetting the objects' mtimes to the present time.
611
+
612
+ If you are GC-ing repositories in a mixed version environment, consider omitting
613
+ the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
614
+ leaving the `gc.cruftPacks` configuration unset until all writers understand
615
+ cruft packs.
616
+
617
+ === Alternatives
618
+
619
+ Notable alternatives to this design include:
620
+
621
+ - The location of the per-object mtime data, and
622
+ - Storing unreachable objects in multiple cruft packs.
623
+
624
+ On the location of mtime data, a new auxiliary file tied to the pack was chosen
625
+ to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
626
+ support for optional chunks of data, it may make sense to consolidate the
627
+ `.mtimes` format into the `.idx` itself.
628
+
629
+ Storing unreachable objects among multiple cruft packs (e.g., creating a new
630
+ cruft pack during each repacking operation including only unreachable objects
631
+ which aren't already stored in an earlier cruft pack) is significantly more
632
+ complicated to construct, and so aren't pursued here. The obvious drawback to
633
+ the current implementation is that the entire cruft pack must be re-written from
634
+ scratch.
635
+
510
636
GIT
511
637
---
512
638
Part of the linkgit:git[1] suite
0 commit comments