Commit 64fd7b3
committed
pack-objects: add third name hash version
The '--name-hash-version=<n>' option in 'git pack-objects' was
introduced to allow for specifying an alternative name hash function
when organizing objects for delta compression. The pack_name_hash_v2()
function was designed to break some collisions while also preserving
some amount of locality for cross-path deltas.
However, in some repositories, that effort to preserve locality results
in enough collisions that it causes issues with full repacks.
Create a third name hash function and extend the '--name-hash-version'
option in 'git pack-objects' and 'git repack' to understand it. This
hash version abandons all efforts for locality and focuses on creating a
somewhat uniformly-distributed hash function to minimize collisions.
We can observe the effect of this collision avoidance in a large
internal monorepo that suffered from collisions in the previous
versions. The updates to p5314-name-hash.sh show these results:
Test this tree
--------------------------------------------------
5314.1: paths at head 227.3K
5314.2: distinct hash value: v1 72.3K
5314.3: maximum multiplicity: v1 14.4K
5314.4: distinct hash value: v2 166.5K
5314.5: maximum multiplicity: v2 138
5314.6: distinct hash value: v3 227.3K
5314.7: maximum multiplicity: v3 2
These results demonstrate that of the 227,000+ paths, nearly all of them
find distinct hash values. The maximum multiplicity is 2, improved from
138 in the v2 hash function. The v2 hash function also had only 166K
distinct values, so it had a wide spread of collisions.
A more modest improvement is available in the open source fluentui repo
[1] with these results:
Test this tree
--------------------------------------------------
5314.1: paths at head 19.5K
5314.2: distinct hash value: v1 8.2K
5314.3: maximum multiplicity: v1 279
5314.4: distinct hash value: v2 17.8K
5314.5: maximum multiplicity: v2 44
5314.6: distinct hash value: v3 19.5K
5314.7: maximum multiplicity: v3 1
[1] https://github.com/microsoft/fluentui
However, it is important to demonstrate the effectiveness of this
function in the context of compressing a repository. We can use
p5313-pack-objects.sh to measure these changes. I will use a simplified
table summarizing the output of that performance test.
| Test | V1 Time | V2 Time | V3 Time | V1 Size | V2 Size | V3 Size |
|-----------|---------|---------|---------|---------|---------|---------|
| Thin Pack | 0.37 s | 0.12 s | 0.07 s | 1.2 M | 22.0 K | 20.4 K |
| Big Pack | 2.04 s | 2.80 s | 1.40 s | 20.4 M | 25.9 M | 19.2 M |
| Shallow | 1.41 s | 1.77 s | 1.27 s | 34.4 M | 33.7 M | 34.8 M |
| Repack | 95.70 s | 33.68 s | 20.88 s | 439.3 M | 160.5 M | 169.1 M |
Here, there are some performance improvements on a time basis, and the
thin and big packs are somewhat smaller in v3. The shallow and repacked
packs are somewhat bigger, though, compared to v2.
Two repositories that have very few collisions in the v1 name hash are
the Git and Linux repositories. Here are their stats for p5313:
Git:
| Test | V1 Time | V2 Time | V3 Time | V1 Size | V2 Size | V3 Size |
|-----------|---------|---------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 0.02 s | 1.1 K | 1.1 K | 15.3 K |
| Big Pack | 1.69 s | 1.95 s | 1.67 s | 13.5 M | 14.5 M | 14.9 M |
| Shallow | 1.26 s | 1.29 s | 1.16 s | 12.0 M | 12.2 M | 12.5 M |
| Repack | 29.51 s | 29.01 s | 29.08 s | 237.7 M | 238.2 M | 237.7 M |
Linux:
| Test | V1 Time | V2 Time | V3 Time | V1 Size | V2 Size | V3 Size |
|-----------|----------|----------|----------|---------|---------|---------|
| Thin Pack | 0.17 s | 0.07 s | 0.07 s | 4.6 K | 4.6 K | 6.8 K |
| Big Pack | 17.88 s | 12.35 s | 12.14 s | 201.1 M | 149.1 M | 160.4 M |
| Shallow | 11.05 s | 22.94 s | 22.16 s | 269.2 M | 273.8 M | 271.8 M |
| Repack | 727.39 s | 566.95 s | 539.33 s | 2.5 G | 2.5 G | 2.6 G |
These repositories make good use of the cross-path deltas that come
about from the v1 name hash function, so they already had mixed results
with the v2 function. The v3 function is generally worse for these
repositories.
An internal Javascript-based repository with name hash collisions
similar to the fluentui repo has these results:
| Test | V1 Time | V2 Time | V3 Time | V1 Size | V2 Size | V3 Size |
|-----------|-----------|----------|----------|---------|---------|---------|
| Thin Pack | 8.28 s | 7.28 s | 0.04 s | 16.8 K | 16.8 K | 3.2 K |
| Big Pack | 12.81 s | 11.66 s | 2.52 s | 29.1 M | 29.1 M | 30.6 M |
| Shallow | 4.86 s | 4.06 s | 3.77 s | 42.5 M | 44.1 M | 45.7 M |
| Repack | 3126.50 s | 496.33 s | 306.86 s | 6.2 G | 855.6 M | 838.2 M |
This repository is also listed as "Repo B" in the repacking size table
below, along with other Javascript repos that have many name hash
collisions with the v1 name hash:
| Repo | V1 Size | V2 Size | V3 Size |
|----------|-----------|---------|---------|
| fluentui | 440 M | 161 M | 170 M |
| Repo B | 6,248 M | 856 M | 840 M |
| Repo C | 37,278 M | 6,921 M | 6,755 M |
| Repo D | 131,204 M | 7,463 M | 7,124 M |
While the fluentui repo had an increase in size using the v3 name hash,
the others had modest improvements over the v2 name hash. But those
modest improvements are dwarfed by the difference from v1 to v2, so it
is unlikely that the regression seen in the other scenarios (packfiles
that are not from full repacks) will be worth using v3 over v2. That is,
unless there are enough collisions even with v2 that the full repack
scenario has larger improvements than these.
Signed-off-by: Derrick Stolee <[email protected]>1 parent 3885ef8 commit 64fd7b3
File tree
9 files changed
+60
-12
lines changed- Documentation
- builtin
- t
- helper
- perf
9 files changed
+60
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
374 | 374 | | |
375 | 375 | | |
376 | 376 | | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
377 | 386 | | |
378 | 387 | | |
379 | 388 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
282 | 291 | | |
283 | 292 | | |
284 | 293 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
273 | | - | |
| 273 | + | |
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| |||
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
295 | 298 | | |
296 | 299 | | |
297 | 300 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
235 | 235 | | |
236 | 236 | | |
237 | 237 | | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
238 | 264 | | |
239 | 265 | | |
240 | 266 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
680 | 680 | | |
681 | 681 | | |
682 | 682 | | |
683 | | - | |
| 683 | + | |
684 | 684 | | |
685 | 685 | | |
686 | 686 | | |
687 | 687 | | |
688 | 688 | | |
689 | | - | |
| 689 | + | |
690 | 690 | | |
691 | 691 | | |
692 | 692 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
0 commit comments