Commit 4fac767
committed
gc: replace ordered sets with unordered sets for in-memory caches
During garbage collection we cache several things -- a set of known-dead
paths, a set of known-alive paths, and a map of paths to their derivers.
Currently they use STL maps and sets, which are ordered structures that
typically are backed by binary trees. Since we are putting pseudorandom
paths into these and looking them up by exact key, we don't need the
ordering, and we're paying a nontrivial cost per insertion.
The existing maps require O(n log n) memory and have O(log n) insertion
and lookup time.
We could instead use unordered maps, which are typically backed by
hashmaps. These require O(n) memory and have O(1) insertion and lookup
time.
On my system this appears to result in a dramatic speedup -- prior to
this patch I was able to delete 400k paths out of 9.5 million over the
course of 34.5 hours. After this patch the same result took 89 minutes.
This result should NOT be taken at face value because the two runs
aren't really comparable; in particular the first started when I had 9.5
million store paths and the seconcd started with 7.8 million, so we are
deleting a different set of paths starting from a much cleaner
filesystem. But I do think it's indicative.
Related: #95811 parent a44ae8b commit 4fac767
1 file changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
455 | 455 | | |
456 | 456 | | |
457 | 457 | | |
458 | | - | |
| 458 | + | |
459 | 459 | | |
460 | 460 | | |
461 | 461 | | |
| |||
661 | 661 | | |
662 | 662 | | |
663 | 663 | | |
664 | | - | |
| 664 | + | |
665 | 665 | | |
666 | 666 | | |
667 | 667 | | |
| |||
0 commit comments