@@ -61,6 +61,109 @@ Design Details
61
61
- The MIDX file format uses a chunk-based approach (similar to the
62
62
commit-graph file) that allows optional data to be added.
63
63
64
+ Incremental multi-pack indexes
65
+ ------------------------------
66
+
67
+ As repositories grow in size, it becomes more expensive to write a
68
+ multi-pack index (MIDX) that includes all packfiles. To accommodate
69
+ this, the "incremental multi-pack indexes" feature allows for combining
70
+ a "chain" of multi-pack indexes.
71
+
72
+ Each individual component of the chain need only contain a small number
73
+ of packfiles. Appending to the chain does not invalidate earlier parts
74
+ of the chain, so repositories can control how much time is spent
75
+ updating the MIDX chain by determining the number of packs in each layer
76
+ of the MIDX chain.
77
+
78
+ === Design state
79
+
80
+ At present, the incremental multi-pack indexes feature is missing two
81
+ important components:
82
+
83
+ - The ability to rewrite earlier portions of the MIDX chain (i.e., to
84
+ "compact" some collection of adjacent MIDX layers into a single
85
+ MIDX). At present the only supported way of shrinking a MIDX chain
86
+ is to rewrite the entire chain from scratch without the `--split`
87
+ flag.
88
+ +
89
+ There are no fundamental limitations that stand in the way of being able
90
+ to implement this feature. It is omitted from the initial implementation
91
+ in order to reduce the complexity, but will be added later.
92
+
93
+ - Support for reachability bitmaps. The classic single MIDX
94
+ implementation does support reachability bitmaps (see the section
95
+ titled "multi-pack-index reverse indexes" in
96
+ linkgit:gitformat-pack[5] for more details).
97
+ +
98
+ As above, there are no fundamental limitations that stand in the way of
99
+ extending the incremental MIDX format to support reachability bitmaps.
100
+ The design below specifically takes this into account, and support for
101
+ reachability bitmaps will be added in a future patch series. It is
102
+ omitted from the current implementation for the same reason as above.
103
+ +
104
+ In brief, to support reachability bitmaps with the incremental MIDX
105
+ feature, the concept of the pseudo-pack order is extended across each
106
+ layer of the incremental MIDX chain to form a concatenated pseudo-pack
107
+ order. This concatenation takes place in the same order as the chain
108
+ itself (in other words, the concatenated pseudo-pack order for a chain
109
+ `{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
110
+ the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
111
+ `$H3`).
112
+ +
113
+ The layout will then be extended so that each layer of the incremental
114
+ MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap
115
+ are offset by the number of objects in the previous layers of the chain.
116
+
117
+ === File layout
118
+
119
+ Instead of storing a single `multi-pack-index` file (with an optional
120
+ `.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
121
+ MIDXs are stored in the following layout:
122
+
123
+ ----
124
+ $GIT_DIR/objects/pack/multi-pack-index.d/
125
+ $GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
126
+ $GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
127
+ $GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
128
+ $GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
129
+ ----
130
+
131
+ The `multi-pack-index-chain` file contains a list of the incremental
132
+ MIDX files in the chain, in order. The above example shows a chain whose
133
+ `multi-pack-index-chain` file would contain the following lines:
134
+
135
+ ----
136
+ $H1
137
+ $H2
138
+ $H3
139
+ ----
140
+
141
+ The `multi-pack-index-$H1.midx` file contains the first layer of the
142
+ multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
143
+ the second layer of the chain, and so on.
144
+
145
+ When both an incremental- and non-incremental MIDX are present, the
146
+ non-incremental MIDX is always read first.
147
+
148
+ === Object positions for incremental MIDXs
149
+
150
+ In the original multi-pack-index design, we refer to objects via their
151
+ lexicographic position (by object IDs) within the repository's singular
152
+ multi-pack-index. In the incremental multi-pack-index design, we refer
153
+ to objects via their index into a concatenated lexicographic ordering
154
+ among each component in the MIDX chain.
155
+
156
+ If `objects_nr()` is a function that returns the number of objects in a
157
+ given MIDX layer, then the index of an object at lexicographic position
158
+ `i` within, say, $H3 is defined as:
159
+
160
+ ----
161
+ objects_nr($H2) + objects_nr($H1) + i
162
+ ----
163
+
164
+ (in the C implementation, this is often computed as `i +
165
+ m->num_objects_in_base`).
166
+
64
167
Future Work
65
168
-----------
66
169
0 commit comments