Skip to content

Commit 6b6029d

Browse files
avargitster
authored andcommitted
docs: move cruft pack docs to gitformat-pack
Integrate the cruft packs documentation initially added in 3d89a8c (Documentation/technical: add cruft-packs.txt, 2022-05-20) to the newly created "gitformat-pack" documentation. Like the "bitmap-format" added before it in 0d4455a (documentation: add documentation for the bitmap format, 2013-11-14) the "cruft-packs" were documented in their own file. As the diff move detection will show there is no change to "Documentation/technical/cruft-packs.txt" here except to move it, and to "indent" the existing sections by adding an extra "=" to them. We could similarly convert the "bitmap-format.txt", but let's leave it for now due to a conflict with the in-flight ac/bitmap-lookup-table series. Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 977c47b commit 6b6029d

File tree

3 files changed

+126
-124
lines changed

3 files changed

+126
-124
lines changed

Documentation/Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,6 @@ TECH_DOCS += MyFirstObjectWalk
105105
TECH_DOCS += SubmittingPatches
106106
TECH_DOCS += ToolsForGit
107107
TECH_DOCS += technical/bitmap-format
108-
TECH_DOCS += technical/cruft-packs
109108
TECH_DOCS += technical/hash-function-transition
110109
TECH_DOCS += technical/http-protocol
111110
TECH_DOCS += technical/long-running-process-protocol

Documentation/gitformat-pack.txt

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ SYNOPSIS
1111
[verse]
1212
$GIT_DIR/objects/pack/pack-*.{pack,idx}
1313
$GIT_DIR/objects/pack/pack-*.rev
14+
$GIT_DIR/objects/pack/pack-*.mtimes
1415
$GIT_DIR/objects/pack/multi-pack-index
1516

1617
DESCRIPTION
@@ -507,6 +508,131 @@ packs arranged in MIDX order (with the preferred pack coming first).
507508
The MIDX's reverse index is stored in the optional 'RIDX' chunk within
508509
the MIDX itself.
509510

511+
== cruft packs
512+
513+
The cruft packs feature offer an alternative to Git's traditional mechanism of
514+
removing unreachable objects. This document provides an overview of Git's
515+
pruning mechanism, and how a cruft pack can be used instead to accomplish the
516+
same.
517+
518+
=== Background
519+
520+
To remove unreachable objects from your repository, Git offers `git repack -Ad`
521+
(see linkgit:git-repack[1]). Quoting from the documentation:
522+
523+
----
524+
[...] unreachable objects in a previous pack become loose, unpacked objects,
525+
instead of being left in the old pack. [...] loose unreachable objects will be
526+
pruned according to normal expiry rules with the next 'git gc' invocation.
527+
----
528+
529+
Unreachable objects aren't removed immediately, since doing so could race with
530+
an incoming push which may reference an object which is about to be deleted.
531+
Instead, those unreachable objects are stored as loose objects and stay that way
532+
until they are older than the expiration window, at which point they are removed
533+
by linkgit:git-prune[1].
534+
535+
Git must store these unreachable objects loose in order to keep track of their
536+
per-object mtimes. If these unreachable objects were written into one big pack,
537+
then either freshening that pack (because an object contained within it was
538+
re-written) or creating a new pack of unreachable objects would cause the pack's
539+
mtime to get updated, and the objects within it would never leave the expiration
540+
window. Instead, objects are stored loose in order to keep track of the
541+
individual object mtimes and avoid a situation where all cruft objects are
542+
freshened at once.
543+
544+
This can lead to undesirable situations when a repository contains many
545+
unreachable objects which have not yet left the grace period. Having large
546+
directories in the shards of `.git/objects` can lead to decreased performance in
547+
the repository. But given enough unreachable objects, this can lead to inode
548+
starvation and degrade the performance of the whole system. Since we
549+
can never pack those objects, these repositories often take up a large amount of
550+
disk space, since we can only zlib compress them, but not store them in delta
551+
chains.
552+
553+
=== Cruft packs
554+
555+
A cruft pack eliminates the need for storing unreachable objects in a loose
556+
state by including the per-object mtimes in a separate file alongside a single
557+
pack containing all loose objects.
558+
559+
A cruft pack is written by `git repack --cruft` when generating a new pack.
560+
linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
561+
is a classic all-into-one repack, meaning that everything in the resulting pack is
562+
reachable, and everything else is unreachable. Once written, the `--cruft`
563+
option instructs `git repack` to generate another pack containing only objects
564+
not packed in the previous step (which equates to packing all unreachable
565+
objects together). This progresses as follows:
566+
567+
1. Enumerate every object, marking any object which is (a) not contained in a
568+
kept-pack, and (b) whose mtime is within the grace period as a traversal
569+
tip.
570+
571+
2. Perform a reachability traversal based on the tips gathered in the previous
572+
step, adding every object along the way to the pack.
573+
574+
3. Write the pack out, along with a `.mtimes` file that records the per-object
575+
timestamps.
576+
577+
This mode is invoked internally by linkgit:git-repack[1] when instructed to
578+
write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
579+
of packs which will not be deleted by the repack; in other words, they contain
580+
all of the repository's reachable objects.
581+
582+
When a repository already has a cruft pack, `git repack --cruft` typically only
583+
adds objects to it. An exception to this is when `git repack` is given the
584+
`--cruft-expiration` option, which allows the generated cruft pack to omit
585+
expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
586+
later on.
587+
588+
It is linkgit:git-gc[1] that is typically responsible for removing expired
589+
unreachable objects.
590+
591+
=== Caution for mixed-version environments
592+
593+
Repositories that have cruft packs in them will continue to work with any older
594+
version of Git. Note, however, that previous versions of Git which do not
595+
understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
596+
all of the objects in it. In other words, do not expect older (pre-cruft pack)
597+
versions of Git to interpret or even read the contents of the `.mtimes` file.
598+
599+
Note that having mixed versions of Git GC-ing the same repository can lead to
600+
unreachable objects never being completely pruned. This can happen under the
601+
following circumstances:
602+
603+
- An older version of Git running GC explodes the contents of an existing
604+
cruft pack loose, using the cruft pack's mtime.
605+
- A newer version running GC collects those loose objects into a cruft pack,
606+
where the .mtime file reflects the loose object's actual mtimes, but the
607+
cruft pack mtime is "now".
608+
609+
Repeating this process will lead to unreachable objects not getting pruned as a
610+
result of repeatedly resetting the objects' mtimes to the present time.
611+
612+
If you are GC-ing repositories in a mixed version environment, consider omitting
613+
the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
614+
leaving the `gc.cruftPacks` configuration unset until all writers understand
615+
cruft packs.
616+
617+
=== Alternatives
618+
619+
Notable alternatives to this design include:
620+
621+
- The location of the per-object mtime data, and
622+
- Storing unreachable objects in multiple cruft packs.
623+
624+
On the location of mtime data, a new auxiliary file tied to the pack was chosen
625+
to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
626+
support for optional chunks of data, it may make sense to consolidate the
627+
`.mtimes` format into the `.idx` itself.
628+
629+
Storing unreachable objects among multiple cruft packs (e.g., creating a new
630+
cruft pack during each repacking operation including only unreachable objects
631+
which aren't already stored in an earlier cruft pack) is significantly more
632+
complicated to construct, and so aren't pursued here. The obvious drawback to
633+
the current implementation is that the entire cruft pack must be re-written from
634+
scratch.
635+
510636
GIT
511637
---
512638
Part of the linkgit:git[1] suite

Documentation/technical/cruft-packs.txt

Lines changed: 0 additions & 123 deletions
This file was deleted.

0 commit comments

Comments
 (0)