Skip to content

Commit 6eb1a7d

Browse files
ttaylorrgitster
authored andcommitted
Documentation: describe incremental MIDX format
Prepare to implement incremental multi-pack indexes (MIDXs) over the next several commits by first describing the relevant prerequisites (like a new chunk in the MIDX format, the directory structure for incremental MIDXs, etc.) The format is described in detail in the patch contents below, but the high-level description is as follows. Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and each `*.midx` within that directory has a single "parent" MIDX, which is the MIDX layer immediately before it in the MIDX chain. The chain order resides in a file 'multi-pack-index-chain' in the same directory. Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 04f5a52 commit 6eb1a7d

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed

Documentation/technical/multi-pack-index.txt

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,109 @@ Design Details
6161
- The MIDX file format uses a chunk-based approach (similar to the
6262
commit-graph file) that allows optional data to be added.
6363

64+
Incremental multi-pack indexes
65+
------------------------------
66+
67+
As repositories grow in size, it becomes more expensive to write a
68+
multi-pack index (MIDX) that includes all packfiles. To accommodate
69+
this, the "incremental multi-pack indexes" feature allows for combining
70+
a "chain" of multi-pack indexes.
71+
72+
Each individual component of the chain need only contain a small number
73+
of packfiles. Appending to the chain does not invalidate earlier parts
74+
of the chain, so repositories can control how much time is spent
75+
updating the MIDX chain by determining the number of packs in each layer
76+
of the MIDX chain.
77+
78+
=== Design state
79+
80+
At present, the incremental multi-pack indexes feature is missing two
81+
important components:
82+
83+
- The ability to rewrite earlier portions of the MIDX chain (i.e., to
84+
"compact" some collection of adjacent MIDX layers into a single
85+
MIDX). At present the only supported way of shrinking a MIDX chain
86+
is to rewrite the entire chain from scratch without the `--split`
87+
flag.
88+
+
89+
There are no fundamental limitations that stand in the way of being able
90+
to implement this feature. It is omitted from the initial implementation
91+
in order to reduce the complexity, but will be added later.
92+
93+
- Support for reachability bitmaps. The classic single MIDX
94+
implementation does support reachability bitmaps (see the section
95+
titled "multi-pack-index reverse indexes" in
96+
linkgit:gitformat-pack[5] for more details).
97+
+
98+
As above, there are no fundamental limitations that stand in the way of
99+
extending the incremental MIDX format to support reachability bitmaps.
100+
The design below specifically takes this into account, and support for
101+
reachability bitmaps will be added in a future patch series. It is
102+
omitted from the current implementation for the same reason as above.
103+
+
104+
In brief, to support reachability bitmaps with the incremental MIDX
105+
feature, the concept of the pseudo-pack order is extended across each
106+
layer of the incremental MIDX chain to form a concatenated pseudo-pack
107+
order. This concatenation takes place in the same order as the chain
108+
itself (in other words, the concatenated pseudo-pack order for a chain
109+
`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
110+
the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
111+
`$H3`).
112+
+
113+
The layout will then be extended so that each layer of the incremental
114+
MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap
115+
are offset by the number of objects in the previous layers of the chain.
116+
117+
=== File layout
118+
119+
Instead of storing a single `multi-pack-index` file (with an optional
120+
`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
121+
MIDXs are stored in the following layout:
122+
123+
----
124+
$GIT_DIR/objects/pack/multi-pack-index.d/
125+
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
126+
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
127+
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
128+
$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
129+
----
130+
131+
The `multi-pack-index-chain` file contains a list of the incremental
132+
MIDX files in the chain, in order. The above example shows a chain whose
133+
`multi-pack-index-chain` file would contain the following lines:
134+
135+
----
136+
$H1
137+
$H2
138+
$H3
139+
----
140+
141+
The `multi-pack-index-$H1.midx` file contains the first layer of the
142+
multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
143+
the second layer of the chain, and so on.
144+
145+
When both an incremental- and non-incremental MIDX are present, the
146+
non-incremental MIDX is always read first.
147+
148+
=== Object positions for incremental MIDXs
149+
150+
In the original multi-pack-index design, we refer to objects via their
151+
lexicographic position (by object IDs) within the repository's singular
152+
multi-pack-index. In the incremental multi-pack-index design, we refer
153+
to objects via their index into a concatenated lexicographic ordering
154+
among each component in the MIDX chain.
155+
156+
If `objects_nr()` is a function that returns the number of objects in a
157+
given MIDX layer, then the index of an object at lexicographic position
158+
`i` within, say, $H3 is defined as:
159+
160+
----
161+
objects_nr($H2) + objects_nr($H1) + i
162+
----
163+
164+
(in the C implementation, this is often computed as `i +
165+
m->num_objects_in_base`).
166+
64167
Future Work
65168
-----------
66169

0 commit comments

Comments
 (0)