Skip to content

Commit ffa47b7

Browse files
committed
Merge branch 'tb/pseudo-merge-reachability-bitmap'
The pseudo-merge reachability bitmap to help more efficient storage of the reachability bitmap in a repository with too many refs has been added. * tb/pseudo-merge-reachability-bitmap: (26 commits) pack-bitmap.c: ensure pseudo-merge offset reads are bounded Documentation/technical/bitmap-format.txt: add missing position table t/perf: implement performance tests for pseudo-merge bitmaps pseudo-merge: implement support for finding existing merges ewah: `bitmap_equals_ewah()` pack-bitmap: extra trace2 information pack-bitmap.c: use pseudo-merges during traversal t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` pack-bitmap: implement test helpers for pseudo-merge ewah: implement `ewah_bitmap_popcount()` pseudo-merge: implement support for reading pseudo-merge commits pack-bitmap.c: read pseudo-merge extension pseudo-merge: scaffolding for reads pack-bitmap: extract `read_bitmap()` function pack-bitmap-write.c: write pseudo-merge table pseudo-merge: implement support for selecting pseudo-merge commits config: introduce `git_config_double()` pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` pack-bitmap-write: support storing pseudo-merge commits ...
2 parents 9005149 + a83e21d commit ffa47b7

24 files changed

+2605
-55
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ MAN7_TXT += gitdiffcore.txt
5151
MAN7_TXT += giteveryday.txt
5252
MAN7_TXT += gitfaq.txt
5353
MAN7_TXT += gitglossary.txt
54+
MAN7_TXT += gitpacking.txt
5455
MAN7_TXT += gitnamespaces.txt
5556
MAN7_TXT += gitremote-helpers.txt
5657
MAN7_TXT += gitrevisions.txt

Documentation/config.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -384,6 +384,8 @@ include::config/apply.txt[]
384384

385385
include::config/attr.txt[]
386386

387+
include::config/bitmap-pseudo-merge.txt[]
388+
387389
include::config/blame.txt[]
388390

389391
include::config/branch.txt[]
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
2+
EXPERIMENTAL and may be subject to change or be removed entirely in the
3+
future. For more information about the pseudo-merge bitmap feature, see
4+
the "Pseudo-merge bitmaps" section of linkgit:gitpacking[7].
5+
6+
bitmapPseudoMerge.<name>.pattern::
7+
Regular expression used to match reference names. Commits
8+
pointed to by references matching this pattern (and meeting
9+
the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
10+
and `bitmapPseudoMerge.<name>.threshold`) will be considered
11+
for inclusion in a pseudo-merge bitmap.
12+
+
13+
Commits are grouped into pseudo-merge groups based on whether or not
14+
any reference(s) that point at a given commit match the pattern, which
15+
is an extended regular expression.
16+
+
17+
Within a pseudo-merge group, commits may be further grouped into
18+
sub-groups based on the capture groups in the pattern. These
19+
sub-groupings are formed from the regular expressions by concatenating
20+
any capture groups from the regular expression, with a '-' dash in
21+
between.
22+
+
23+
For example, if the pattern is `refs/tags/`, then all tags (provided
24+
they meet the below criteria) will be considered candidates for the
25+
same pseudo-merge group. However, if the pattern is instead
26+
`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
27+
be grouped into separate pseudo-merge groups, based on the remote
28+
number.
29+
30+
bitmapPseudoMerge.<name>.decay::
31+
Determines the rate at which consecutive pseudo-merge bitmap
32+
groups decrease in size. Must be non-negative. This parameter
33+
can be thought of as `k` in the function `f(n) = C * n^-k`,
34+
where `f(n)` is the size of the `n`th group.
35+
+
36+
Setting the decay rate equal to `0` will cause all groups to be the
37+
same size. Setting the decay rate equal to `1` will cause the `n`th
38+
group to be `1/n` the size of the initial group. Higher values of the
39+
decay rate cause consecutive groups to shrink at an increasing rate.
40+
The default is `1`.
41+
+
42+
If all groups are the same size, it is possible that groups containing
43+
newer commits will be able to be used less often than earlier groups,
44+
since it is more likely that the references pointing at newer commits
45+
will be updated more often than a reference pointing at an old commit.
46+
47+
bitmapPseudoMerge.<name>.sampleRate::
48+
Determines the proportion of non-bitmapped commits (among
49+
reference tips) which are selected for inclusion in an
50+
unstable pseudo-merge bitmap. Must be between `0` and `1`
51+
(inclusive). The default is `1`.
52+
53+
bitmapPseudoMerge.<name>.threshold::
54+
Determines the minimum age of non-bitmapped commits (among
55+
reference tips, as above) which are candidates for inclusion
56+
in an unstable pseudo-merge bitmap. The default is
57+
`1.week.ago`.
58+
59+
bitmapPseudoMerge.<name>.maxMerges::
60+
Determines the maximum number of pseudo-merge commits among
61+
which commits may be distributed.
62+
+
63+
For pseudo-merge groups whose pattern does not contain any capture
64+
groups, this setting is applied for all commits matching the regular
65+
expression. For patterns that have one or more capture groups, this
66+
setting is applied for each distinct capture group.
67+
+
68+
For example, if your capture group is `refs/tags/`, then this setting
69+
will distribute all tags into a maximum of `maxMerges` pseudo-merge
70+
commits. However, if your capture group is, say,
71+
`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
72+
each remote's set of tags individually.
73+
+
74+
Must be non-negative. The default value is 64.
75+
76+
bitmapPseudoMerge.<name>.stableThreshold::
77+
Determines the minimum age of commits (among reference tips,
78+
as above, however stable commits are still considered
79+
candidates even when they have been covered by a bitmap) which
80+
are candidates for a stable a pseudo-merge bitmap. The default
81+
is `1.month.ago`.
82+
+
83+
Setting this threshold to a smaller value (e.g., 1.week.ago) will cause
84+
more stable groups to be generated (which impose a one-time generation
85+
cost) but those groups will likely become stale over time. Using a
86+
larger value incurs the opposite penalty (fewer stable groups which are
87+
more useful).
88+
89+
bitmapPseudoMerge.<name>.stableSize::
90+
Determines the size (in number of commits) of a stable
91+
psuedo-merge bitmap. The default is `512`.

Documentation/gitpacking.txt

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
gitpacking(7)
2+
=============
3+
4+
NAME
5+
----
6+
gitpacking - Advanced concepts related to packing in Git
7+
8+
SYNOPSIS
9+
--------
10+
gitpacking
11+
12+
DESCRIPTION
13+
-----------
14+
15+
This document aims to describe some advanced concepts related to packing
16+
in Git.
17+
18+
Many concepts are currently described scattered between manual pages of
19+
various Git commands, including linkgit:git-pack-objects[1],
20+
linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5],
21+
and parts of the `Documentation/technical` tree.
22+
23+
There are many aspects of packing in Git that are not covered in this
24+
document that instead live in the aforementioned areas. Over time, those
25+
scattered bits may coalesce into this document.
26+
27+
== Pseudo-merge bitmaps
28+
29+
NOTE: Pseudo-merge bitmaps are considered an experimental feature, so
30+
the configuration and many of the ideas are subject to change.
31+
32+
=== Background
33+
34+
Reachability bitmaps are most efficient when we have on-disk stored
35+
bitmaps for one or more of the starting points of a traversal. For this
36+
reason, Git prefers storing bitmaps for commits at the tips of refs,
37+
because traversals tend to start with those points.
38+
39+
But if you have a large number of refs, it's not feasible to store a
40+
bitmap for _every_ ref tip. It takes up space, and just OR-ing all of
41+
those bitmaps together is expensive.
42+
43+
One way we can deal with that is to create bitmaps that represent
44+
_groups_ of refs. When a traversal asks about the entire group, then we
45+
can use this single bitmap instead of considering each ref individually.
46+
Because these bitmaps represent the set of objects which would be
47+
reachable in a hypothetical merge of all of the commits, we call them
48+
pseudo-merge bitmaps.
49+
50+
=== Overview
51+
52+
A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
53+
follows:
54+
55+
Commit bitmap::
56+
57+
A bitmap whose set bits describe the set of commits included in the
58+
pseudo-merge's "merge" bitmap (as below).
59+
60+
Merge bitmap::
61+
62+
A bitmap whose set bits describe the reachability closure over the set
63+
of commits in the pseudo-merge's "commits" bitmap (as above). An
64+
identical bitmap would be generated for an octopus merge with the same
65+
set of parents as described in the commits bitmap.
66+
67+
Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
68+
for a given pseudo-merge are listed on either side of the traversal,
69+
either directly (by explicitly asking for them as part of the `HAVES`
70+
or `WANTS`) or indirectly (by encountering them during a fill-in
71+
traversal).
72+
73+
=== Use-cases
74+
75+
For example, suppose there exists a pseudo-merge bitmap with a large
76+
number of commits, all of which are listed in the `WANTS` section of
77+
some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
78+
bitmap machinery can quickly determine there is a pseudo-merge which
79+
satisfies some subset of the wanted objects on either side of the query.
80+
Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
81+
resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
82+
have to repeat the decompression and `OR`-ing step over a potentially
83+
large number of individual bitmaps, which can take proportionally more
84+
time.
85+
86+
Another benefit of pseudo-merges arises when there is some combination
87+
of (a) a large number of references, with (b) poor bitmap coverage, and
88+
(c) deep, nested trees, making fill-in traversal relatively expensive.
89+
For example, suppose that there are a large enough number of tags where
90+
bitmapping each of the tags individually is infeasible. Without
91+
pseudo-merge bitmaps, computing the result of, say, `git rev-list
92+
--use-bitmap-index --count --objects --tags` would likely require a
93+
large amount of fill-in traversal. But when a large quantity of those
94+
tags are stored together in a pseudo-merge bitmap, the bitmap machinery
95+
can take advantage of the fact that we only care about the union of
96+
objects reachable from all of those tags, and answer the query much
97+
faster.
98+
99+
=== Configuration
100+
101+
Reference tips are grouped into different pseudo-merge groups according
102+
to two criteria. A reference name matches one or more of the defined
103+
pseudo-merge patterns, and optionally one or more capture groups within
104+
that pattern which further partition the group.
105+
106+
Within a group, commits may be considered "stable", or "unstable"
107+
depending on their age. These are adjusted by setting the
108+
`bitmapPseudoMerge.<name>.stableThreshold` and
109+
`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.
110+
111+
All stable commits are grouped into pseudo-merges of equal size
112+
(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
113+
configuration is set to, say, 100, then the first 100 commits (ordered
114+
by committer date) which are older than the `stableThreshold` value will
115+
form one group, the next 100 commits will form another group, and so on.
116+
117+
Among unstable commits, the pseudo-merge machinery will attempt to
118+
combine older commits into large groups as opposed to newer commits
119+
which will appear in smaller groups. This is based on the heuristic that
120+
references whose tip commit is older are less likely to be modified to
121+
point at a different commit than a reference whose tip commit is newer.
122+
123+
The size of groups is determined by a power-law decay function, and the
124+
decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
125+
where `f(n)` describes the size of the `n`-th pseudo-merge group. The
126+
sample rate controls what percentage of eligible commits are considered
127+
as candidates. The threshold parameter indicates the minimum age (so as
128+
to avoid including too-recent commits in a pseudo-merge group, making it
129+
less likely to be valid). The "maxMerges" parameter sets an upper-bound
130+
on the number of pseudo-merge commits an individual group
131+
132+
The "stable"-related parameters control "stable" pseudo-merge groups,
133+
comprised of a fixed number of commits which are older than the
134+
configured "stable threshold" value and may be grouped together in
135+
chunks of "stableSize" in order of age.
136+
137+
The exact configuration for pseudo-merges is as follows:
138+
139+
include::config/bitmap-pseudo-merge.txt[]
140+
141+
=== Examples
142+
143+
Suppose that you have a repository with a large number of references,
144+
and you want a bare-bones configuration of pseudo-merge bitmaps that
145+
will enhance bitmap coverage of the `refs/` namespace. You may start
146+
wiht a configuration like so:
147+
148+
[bitmapPseudoMerge "all"]
149+
pattern = "refs/"
150+
threshold = now
151+
stableThreshold = never
152+
sampleRate = 100
153+
maxMerges = 64
154+
155+
This will create pseudo-merge bitmaps for all references, regardless of
156+
their age, and group them into 64 pseudo-merge commits.
157+
158+
If you wanted to separate tags from branches when generating
159+
pseudo-merge commits, you would instead define the pattern with a
160+
capture group, like so:
161+
162+
[bitmapPseudoMerge "all"]
163+
pattern = "refs/(heads/tags)/"
164+
165+
Suppose instead that you are working in a fork-network repository, with
166+
each fork specified by some numeric ID, and whose refs reside in
167+
`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
168+
fork) in the network. In this instance, you may instead write something
169+
like:
170+
171+
[bitmapPseudoMerge "all"]
172+
pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
173+
threshold = now
174+
stableThreshold = never
175+
sampleRate = 100
176+
maxMerges = 64
177+
178+
Which would generate pseudo-merge group identifiers like "1234-heads",
179+
and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
180+
respectively).
181+
182+
SEE ALSO
183+
--------
184+
linkgit:git-pack-objects[1]
185+
linkgit:git-repack[1]
186+
187+
GIT
188+
---
189+
Part of the linkgit:git[1] suite

0 commit comments

Comments
 (0)