@@ -25,9 +25,9 @@ An object is uniquely described by its bit position within a bitmap:
2525 is defined as follows:
2626
2727 o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
28-
29- The ordering between packs is done according to the MIDX's .rev file.
30- Notably, the preferred pack sorts ahead of all other packs.
28+ +
29+ The ordering between packs is done according to the MIDX's .rev file.
30+ Notably, the preferred pack sorts ahead of all other packs.
3131
3232The on-disk representation (described below) of a bitmap is the same regardless
3333of whether or not that bitmap belongs to a packfile or a MIDX. The only
@@ -39,97 +39,108 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
3939
4040== On-disk format
4141
42- - A header appears at the beginning:
43-
44- 4-byte signature: {'B', 'I', 'T', 'M'}
45-
46- 2-byte version number (network byte order)
47- The current implementation only supports version 1
48- of the bitmap index (the same one as JGit).
49-
50- 2-byte flags (network byte order)
51-
52- The following flags are supported:
53-
54- - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
55- This flag must always be present. It implies that the
56- bitmap index has been generated for a packfile or
57- multi-pack index (MIDX) with full closure (i.e. where
58- every single object in the packfile/MIDX can find its
59- parent links inside the same packfile/MIDX). This is a
60- requirement for the bitmap index format, also present in
61- JGit, that greatly reduces the complexity of the
62- implementation.
63-
64- - BITMAP_OPT_HASH_CACHE (0x4)
65- If present, the end of the bitmap file contains
66- `N` 32-bit name-hash values, one per object in the
67- pack/MIDX. The format and meaning of the name-hash is
68- described below.
69-
70- 4-byte entry count (network byte order)
71-
72- The total count of entries (bitmapped commits) in this bitmap index.
73-
74- 20-byte checksum
75-
76- The SHA1 checksum of the pack/MIDX this bitmap index
77- belongs to.
78-
79- - 4 EWAH bitmaps that act as type indexes
80-
81- Type indexes are serialized after the hash cache in the shape
82- of four EWAH bitmaps stored consecutively (see Appendix A for
83- the serialization format of an EWAH bitmap).
84-
85- There is a bitmap for each Git object type, stored in the following
86- order:
87-
88- - Commits
89- - Trees
90- - Blobs
91- - Tags
92-
93- In each bitmap, the `n`th bit is set to true if the `n`th object
94- in the packfile or multi-pack index is of that type.
95-
96- The obvious consequence is that the OR of all 4 bitmaps will result
97- in a full set (all bits set), and the AND of all 4 bitmaps will
98- result in an empty bitmap (no bits set).
99-
100- - N entries with compressed bitmaps, one for each indexed commit
101-
102- Where `N` is the total amount of entries in this bitmap index.
103- Each entry contains the following:
104-
105- - 4-byte object position (network byte order)
106- The position **in the index for the packfile or
107- multi-pack index** where the bitmap for this commit is
108- found.
109-
110- - 1-byte XOR-offset
111- The xor offset used to compress this bitmap. For an entry
112- in position `x`, a XOR offset of `y` means that the actual
113- bitmap representing this commit is composed by XORing the
114- bitmap for this entry with the bitmap in entry `x-y` (i.e.
115- the bitmap `y` entries before this one).
116-
117- Note that this compression can be recursive. In order to
118- XOR this entry with a previous one, the previous entry needs
119- to be decompressed first, and so on.
120-
121- The hard-limit for this offset is 160 (an entry can only be
122- xor'ed against one of the 160 entries preceding it). This
123- number is always positive, and hence entries are always xor'ed
124- with **previous** bitmaps, not bitmaps that will come afterwards
125- in the index.
126-
127- - 1-byte flags for this bitmap
128- At the moment the only available flag is `0x1`, which hints
129- that this bitmap can be re-used when rebuilding bitmap indexes
130- for the repository.
131-
132- - The compressed bitmap itself, see Appendix A.
42+ * A header appears at the beginning:
43+
44+ 4-byte signature: :: {'B', 'I', 'T', 'M'}
45+
46+ 2-byte version number (network byte order): ::
47+
48+ The current implementation only supports version 1
49+ of the bitmap index (the same one as JGit).
50+
51+ 2-byte flags (network byte order): ::
52+
53+ The following flags are supported:
54+
55+ ** {empty}
56+ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: :::
57+
58+ This flag must always be present. It implies that the
59+ bitmap index has been generated for a packfile or
60+ multi-pack index (MIDX) with full closure (i.e. where
61+ every single object in the packfile/MIDX can find its
62+ parent links inside the same packfile/MIDX). This is a
63+ requirement for the bitmap index format, also present in
64+ JGit, that greatly reduces the complexity of the
65+ implementation.
66+
67+ ** {empty}
68+ BITMAP_OPT_HASH_CACHE (0x4): :::
69+
70+ If present, the end of the bitmap file contains
71+ `N` 32-bit name-hash values, one per object in the
72+ pack/MIDX. The format and meaning of the name-hash is
73+ described below.
74+
75+ 4-byte entry count (network byte order): ::
76+ The total count of entries (bitmapped commits) in this bitmap index.
77+
78+ 20-byte checksum: ::
79+ The SHA1 checksum of the pack/MIDX this bitmap index
80+ belongs to.
81+
82+ * 4 EWAH bitmaps that act as type indexes
83+ +
84+ Type indexes are serialized after the hash cache in the shape
85+ of four EWAH bitmaps stored consecutively (see Appendix A for
86+ the serialization format of an EWAH bitmap).
87+ +
88+ There is a bitmap for each Git object type, stored in the following
89+ order:
90+ +
91+ - Commits
92+ - Trees
93+ - Blobs
94+ - Tags
95+
96+ +
97+ In each bitmap, the `n`th bit is set to true if the `n`th object
98+ in the packfile or multi-pack index is of that type.
99+ +
100+ The obvious consequence is that the OR of all 4 bitmaps will result
101+ in a full set (all bits set), and the AND of all 4 bitmaps will
102+ result in an empty bitmap (no bits set).
103+
104+ * N entries with compressed bitmaps, one for each indexed commit
105+ +
106+ Where `N` is the total amount of entries in this bitmap index.
107+ Each entry contains the following:
108+
109+ ** {empty}
110+ 4-byte object position (network byte order): ::
111+ The position **in the index for the packfile or
112+ multi-pack index** where the bitmap for this commit is
113+ found.
114+
115+ ** {empty}
116+ 1-byte XOR-offset: ::
117+ The xor offset used to compress this bitmap. For an entry
118+ in position `x`, a XOR offset of `y` means that the actual
119+ bitmap representing this commit is composed by XORing the
120+ bitmap for this entry with the bitmap in entry `x-y` (i.e.
121+ the bitmap `y` entries before this one).
122+ +
123+ NOTE: This compression can be recursive. In order to
124+ XOR this entry with a previous one, the previous entry needs
125+ to be decompressed first, and so on.
126+ +
127+ The hard-limit for this offset is 160 (an entry can only be
128+ xor'ed against one of the 160 entries preceding it). This
129+ number is always positive, and hence entries are always xor'ed
130+ with **previous** bitmaps, not bitmaps that will come afterwards
131+ in the index.
132+
133+ ** {empty}
134+ 1-byte flags for this bitmap: ::
135+ At the moment the only available flag is `0x1`, which hints
136+ that this bitmap can be re-used when rebuilding bitmap indexes
137+ for the repository.
138+
139+ ** The compressed bitmap itself, see Appendix A.
140+
141+ * {empty}
142+ TRAILER: ::
143+ Trailing checksum of the preceding contents.
133144
134145== Appendix A: Serialization format for an EWAH bitmap
135146
@@ -142,8 +153,8 @@ implementation:
142153 - 4-byte number of words of the COMPRESSED bitmap, when stored
143154
144155 - N x 8-byte words, as specified by the previous field
145-
146- This is the actual content of the compressed bitmap.
156+ +
157+ This is the actual content of the compressed bitmap.
147158
148159 - 4-byte position of the current RLW for the compressed
149160 bitmap
0 commit comments