@@ -25,9 +25,9 @@ An object is uniquely described by its bit position within a bitmap:
25
25
is defined as follows:
26
26
27
27
o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
28
-
29
- The ordering between packs is done according to the MIDX's .rev file.
30
- Notably, the preferred pack sorts ahead of all other packs.
28
+ +
29
+ The ordering between packs is done according to the MIDX's .rev file.
30
+ Notably, the preferred pack sorts ahead of all other packs.
31
31
32
32
The on-disk representation (described below) of a bitmap is the same regardless
33
33
of whether or not that bitmap belongs to a packfile or a MIDX. The only
@@ -39,97 +39,108 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
39
39
40
40
== On-disk format
41
41
42
- - A header appears at the beginning:
43
-
44
- 4-byte signature: {'B', 'I', 'T', 'M'}
45
-
46
- 2-byte version number (network byte order)
47
- The current implementation only supports version 1
48
- of the bitmap index (the same one as JGit).
49
-
50
- 2-byte flags (network byte order)
51
-
52
- The following flags are supported:
53
-
54
- - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
55
- This flag must always be present. It implies that the
56
- bitmap index has been generated for a packfile or
57
- multi-pack index (MIDX) with full closure (i.e. where
58
- every single object in the packfile/MIDX can find its
59
- parent links inside the same packfile/MIDX). This is a
60
- requirement for the bitmap index format, also present in
61
- JGit, that greatly reduces the complexity of the
62
- implementation.
63
-
64
- - BITMAP_OPT_HASH_CACHE (0x4)
65
- If present, the end of the bitmap file contains
66
- `N` 32-bit name-hash values, one per object in the
67
- pack/MIDX. The format and meaning of the name-hash is
68
- described below.
69
-
70
- 4-byte entry count (network byte order)
71
-
72
- The total count of entries (bitmapped commits) in this bitmap index.
73
-
74
- 20-byte checksum
75
-
76
- The SHA1 checksum of the pack/MIDX this bitmap index
77
- belongs to.
78
-
79
- - 4 EWAH bitmaps that act as type indexes
80
-
81
- Type indexes are serialized after the hash cache in the shape
82
- of four EWAH bitmaps stored consecutively (see Appendix A for
83
- the serialization format of an EWAH bitmap).
84
-
85
- There is a bitmap for each Git object type, stored in the following
86
- order:
87
-
88
- - Commits
89
- - Trees
90
- - Blobs
91
- - Tags
92
-
93
- In each bitmap, the `n`th bit is set to true if the `n`th object
94
- in the packfile or multi-pack index is of that type.
95
-
96
- The obvious consequence is that the OR of all 4 bitmaps will result
97
- in a full set (all bits set), and the AND of all 4 bitmaps will
98
- result in an empty bitmap (no bits set).
99
-
100
- - N entries with compressed bitmaps, one for each indexed commit
101
-
102
- Where `N` is the total amount of entries in this bitmap index.
103
- Each entry contains the following:
104
-
105
- - 4-byte object position (network byte order)
106
- The position **in the index for the packfile or
107
- multi-pack index** where the bitmap for this commit is
108
- found.
109
-
110
- - 1-byte XOR-offset
111
- The xor offset used to compress this bitmap. For an entry
112
- in position `x`, a XOR offset of `y` means that the actual
113
- bitmap representing this commit is composed by XORing the
114
- bitmap for this entry with the bitmap in entry `x-y` (i.e.
115
- the bitmap `y` entries before this one).
116
-
117
- Note that this compression can be recursive. In order to
118
- XOR this entry with a previous one, the previous entry needs
119
- to be decompressed first, and so on.
120
-
121
- The hard-limit for this offset is 160 (an entry can only be
122
- xor'ed against one of the 160 entries preceding it). This
123
- number is always positive, and hence entries are always xor'ed
124
- with **previous** bitmaps, not bitmaps that will come afterwards
125
- in the index.
126
-
127
- - 1-byte flags for this bitmap
128
- At the moment the only available flag is `0x1`, which hints
129
- that this bitmap can be re-used when rebuilding bitmap indexes
130
- for the repository.
131
-
132
- - The compressed bitmap itself, see Appendix A.
42
+ * A header appears at the beginning:
43
+
44
+ 4-byte signature: :: {'B', 'I', 'T', 'M'}
45
+
46
+ 2-byte version number (network byte order): ::
47
+
48
+ The current implementation only supports version 1
49
+ of the bitmap index (the same one as JGit).
50
+
51
+ 2-byte flags (network byte order): ::
52
+
53
+ The following flags are supported:
54
+
55
+ ** {empty}
56
+ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: :::
57
+
58
+ This flag must always be present. It implies that the
59
+ bitmap index has been generated for a packfile or
60
+ multi-pack index (MIDX) with full closure (i.e. where
61
+ every single object in the packfile/MIDX can find its
62
+ parent links inside the same packfile/MIDX). This is a
63
+ requirement for the bitmap index format, also present in
64
+ JGit, that greatly reduces the complexity of the
65
+ implementation.
66
+
67
+ ** {empty}
68
+ BITMAP_OPT_HASH_CACHE (0x4): :::
69
+
70
+ If present, the end of the bitmap file contains
71
+ `N` 32-bit name-hash values, one per object in the
72
+ pack/MIDX. The format and meaning of the name-hash is
73
+ described below.
74
+
75
+ 4-byte entry count (network byte order): ::
76
+ The total count of entries (bitmapped commits) in this bitmap index.
77
+
78
+ 20-byte checksum: ::
79
+ The SHA1 checksum of the pack/MIDX this bitmap index
80
+ belongs to.
81
+
82
+ * 4 EWAH bitmaps that act as type indexes
83
+ +
84
+ Type indexes are serialized after the hash cache in the shape
85
+ of four EWAH bitmaps stored consecutively (see Appendix A for
86
+ the serialization format of an EWAH bitmap).
87
+ +
88
+ There is a bitmap for each Git object type, stored in the following
89
+ order:
90
+ +
91
+ - Commits
92
+ - Trees
93
+ - Blobs
94
+ - Tags
95
+
96
+ +
97
+ In each bitmap, the `n`th bit is set to true if the `n`th object
98
+ in the packfile or multi-pack index is of that type.
99
+ +
100
+ The obvious consequence is that the OR of all 4 bitmaps will result
101
+ in a full set (all bits set), and the AND of all 4 bitmaps will
102
+ result in an empty bitmap (no bits set).
103
+
104
+ * N entries with compressed bitmaps, one for each indexed commit
105
+ +
106
+ Where `N` is the total amount of entries in this bitmap index.
107
+ Each entry contains the following:
108
+
109
+ ** {empty}
110
+ 4-byte object position (network byte order): ::
111
+ The position **in the index for the packfile or
112
+ multi-pack index** where the bitmap for this commit is
113
+ found.
114
+
115
+ ** {empty}
116
+ 1-byte XOR-offset: ::
117
+ The xor offset used to compress this bitmap. For an entry
118
+ in position `x`, a XOR offset of `y` means that the actual
119
+ bitmap representing this commit is composed by XORing the
120
+ bitmap for this entry with the bitmap in entry `x-y` (i.e.
121
+ the bitmap `y` entries before this one).
122
+ +
123
+ NOTE: This compression can be recursive. In order to
124
+ XOR this entry with a previous one, the previous entry needs
125
+ to be decompressed first, and so on.
126
+ +
127
+ The hard-limit for this offset is 160 (an entry can only be
128
+ xor'ed against one of the 160 entries preceding it). This
129
+ number is always positive, and hence entries are always xor'ed
130
+ with **previous** bitmaps, not bitmaps that will come afterwards
131
+ in the index.
132
+
133
+ ** {empty}
134
+ 1-byte flags for this bitmap: ::
135
+ At the moment the only available flag is `0x1`, which hints
136
+ that this bitmap can be re-used when rebuilding bitmap indexes
137
+ for the repository.
138
+
139
+ ** The compressed bitmap itself, see Appendix A.
140
+
141
+ * {empty}
142
+ TRAILER: ::
143
+ Trailing checksum of the preceding contents.
133
144
134
145
== Appendix A: Serialization format for an EWAH bitmap
135
146
@@ -142,8 +153,8 @@ implementation:
142
153
- 4-byte number of words of the COMPRESSED bitmap, when stored
143
154
144
155
- N x 8-byte words, as specified by the previous field
145
-
146
- This is the actual content of the compressed bitmap.
156
+ +
157
+ This is the actual content of the compressed bitmap.
147
158
148
159
- 4-byte position of the current RLW for the compressed
149
160
bitmap
0 commit comments