Skip to content

Commit a949ebd

Browse files
jltoblergitster
authored andcommitted
reftable/stack: use geometric table compaction
To reduce the number of on-disk reftables, compaction is performed. Contiguous tables with the same binary log value of size are grouped into segments. The segment that has both the lowest binary log value and contains more than one table is set as the starting point when identifying the compaction segment. Since segments containing a single table are not initially considered for compaction, if the table appended to the list does not match the previous table log value, no compaction occurs for the new table. It is therefore possible for unbounded growth of the table list. This can be demonstrated by repeating the following sequence: git branch -f foo git branch -d foo Each operation results in a new table being written with no compaction occurring until a separate operation produces a table matching the previous table log value. Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after each operation by individually evaluating each table in reverse index order. This strategy results in a much simpler and more robust algorithm compared to the previous one while also maintaining a minimal ordered set of tables on-disk. When creating 10 thousand references, the new strategy has no performance impact: Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] Range (min … max): 26.447 s … 26.569 s 10 runs Benchmark 2: update-ref: create refs sequentially (revision = HEAD) Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] Range (min … max): 26.366 s … 26.444 s 10 runs Summary update-ref: create refs sequentially (revision = HEAD) ran 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. Since compaction is more aggressive in ensuring tables maintain a geometric sequence, the expected table count is reduced in these tests. In `reftable/stack_test.c` tests related to `sizes_to_segments()` are removed because the function is no longer needed. Also, the `test_suggest_compaction_segment()` test is updated to better showcase and reflect the new geometric compaction behavior. Signed-off-by: Justin Tobler <[email protected]> Acked-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 7c8eb59 commit a949ebd

File tree

4 files changed

+111
-131
lines changed

4 files changed

+111
-131
lines changed

reftable/stack.c

Lines changed: 62 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1216,75 +1216,76 @@ static int segment_size(struct segment *s)
12161216
return s->end - s->start;
12171217
}
12181218

1219-
int fastlog2(uint64_t sz)
1220-
{
1221-
int l = 0;
1222-
if (sz == 0)
1223-
return 0;
1224-
for (; sz; sz /= 2) {
1225-
l++;
1226-
}
1227-
return l - 1;
1228-
}
1229-
1230-
struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n)
1231-
{
1232-
struct segment *segs = reftable_calloc(n, sizeof(*segs));
1233-
struct segment cur = { 0 };
1234-
size_t next = 0, i;
1235-
1236-
if (n == 0) {
1237-
*seglen = 0;
1238-
return segs;
1239-
}
1240-
for (i = 0; i < n; i++) {
1241-
int log = fastlog2(sizes[i]);
1242-
if (cur.log != log && cur.bytes > 0) {
1243-
struct segment fresh = {
1244-
.start = i,
1245-
};
1246-
1247-
segs[next++] = cur;
1248-
cur = fresh;
1249-
}
1250-
1251-
cur.log = log;
1252-
cur.end = i + 1;
1253-
cur.bytes += sizes[i];
1254-
}
1255-
segs[next++] = cur;
1256-
*seglen = next;
1257-
return segs;
1258-
}
1259-
12601219
struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
12611220
{
1262-
struct segment min_seg = {
1263-
.log = 64,
1264-
};
1265-
struct segment *segs;
1266-
size_t seglen = 0, i;
1267-
1268-
segs = sizes_to_segments(&seglen, sizes, n);
1269-
for (i = 0; i < seglen; i++) {
1270-
if (segment_size(&segs[i]) == 1)
1271-
continue;
1221+
struct segment seg = { 0 };
1222+
uint64_t bytes;
1223+
size_t i;
12721224

1273-
if (segs[i].log < min_seg.log)
1274-
min_seg = segs[i];
1275-
}
1225+
/*
1226+
* If there are no tables or only a single one then we don't have to
1227+
* compact anything. The sequence is geometric by definition already.
1228+
*/
1229+
if (n <= 1)
1230+
return seg;
12761231

1277-
while (min_seg.start > 0) {
1278-
size_t prev = min_seg.start - 1;
1279-
if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev]))
1232+
/*
1233+
* Find the ending table of the compaction segment needed to restore the
1234+
* geometric sequence. Note that the segment end is exclusive.
1235+
*
1236+
* To do so, we iterate backwards starting from the most recent table
1237+
* until a valid segment end is found. If the preceding table is smaller
1238+
* than the current table multiplied by the geometric factor (2), the
1239+
* compaction segment end has been identified.
1240+
*
1241+
* Tables after the ending point are not added to the byte count because
1242+
* they are already valid members of the geometric sequence. Due to the
1243+
* properties of a geometric sequence, it is not possible for the sum of
1244+
* these tables to exceed the value of the ending point table.
1245+
*
1246+
* Example table size sequence requiring no compaction:
1247+
* 64, 32, 16, 8, 4, 2, 1
1248+
*
1249+
* Example table size sequence where compaction segment end is set to
1250+
* the last table. Since the segment end is exclusive, the last table is
1251+
* excluded during subsequent compaction and the table with size 3 is
1252+
* the final table included:
1253+
* 64, 32, 16, 8, 4, 3, 1
1254+
*/
1255+
for (i = n - 1; i > 0; i--) {
1256+
if (sizes[i - 1] < sizes[i] * 2) {
1257+
seg.end = i + 1;
1258+
bytes = sizes[i];
12801259
break;
1260+
}
1261+
}
12811262

1282-
min_seg.start = prev;
1283-
min_seg.bytes += sizes[prev];
1263+
/*
1264+
* Find the starting table of the compaction segment by iterating
1265+
* through the remaining tables and keeping track of the accumulated
1266+
* size of all tables seen from the segment end table. The previous
1267+
* table is compared to the accumulated size because the tables from the
1268+
* segment end are merged backwards recursively.
1269+
*
1270+
* Note that we keep iterating even after we have found the first
1271+
* starting point. This is because there may be tables in the stack
1272+
* preceding that first starting point which violate the geometric
1273+
* sequence.
1274+
*
1275+
* Example compaction segment start set to table with size 32:
1276+
* 128, 32, 16, 8, 4, 3, 1
1277+
*/
1278+
for (; i > 0; i--) {
1279+
uint64_t curr = bytes;
1280+
bytes += sizes[i - 1];
1281+
1282+
if (sizes[i - 1] < curr * 2) {
1283+
seg.start = i - 1;
1284+
seg.bytes = bytes;
1285+
}
12841286
}
12851287

1286-
reftable_free(segs);
1287-
return min_seg;
1288+
return seg;
12881289
}
12891290

12901291
static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)

reftable/stack.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,9 @@ int read_lines(const char *filename, char ***lines);
3232

3333
struct segment {
3434
size_t start, end;
35-
int log;
3635
uint64_t bytes;
3736
};
3837

39-
int fastlog2(uint64_t sz);
40-
struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n);
4138
struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
4239

4340
#endif

reftable/stack_test.c

Lines changed: 13 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -760,59 +760,13 @@ static void test_reftable_stack_hash_id(void)
760760
clear_dir(dir);
761761
}
762762

763-
static void test_log2(void)
764-
{
765-
EXPECT(1 == fastlog2(3));
766-
EXPECT(2 == fastlog2(4));
767-
EXPECT(2 == fastlog2(5));
768-
}
769-
770-
static void test_sizes_to_segments(void)
771-
{
772-
uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
773-
/* .................0 1 2 3 4 5 */
774-
775-
size_t seglen = 0;
776-
struct segment *segs =
777-
sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
778-
EXPECT(segs[2].log == 3);
779-
EXPECT(segs[2].start == 5);
780-
EXPECT(segs[2].end == 6);
781-
782-
EXPECT(segs[1].log == 2);
783-
EXPECT(segs[1].start == 2);
784-
EXPECT(segs[1].end == 5);
785-
reftable_free(segs);
786-
}
787-
788-
static void test_sizes_to_segments_empty(void)
789-
{
790-
size_t seglen = 0;
791-
struct segment *segs = sizes_to_segments(&seglen, NULL, 0);
792-
EXPECT(seglen == 0);
793-
reftable_free(segs);
794-
}
795-
796-
static void test_sizes_to_segments_all_equal(void)
797-
{
798-
uint64_t sizes[] = { 5, 5 };
799-
size_t seglen = 0;
800-
struct segment *segs =
801-
sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
802-
EXPECT(seglen == 1);
803-
EXPECT(segs[0].start == 0);
804-
EXPECT(segs[0].end == 2);
805-
reftable_free(segs);
806-
}
807-
808763
static void test_suggest_compaction_segment(void)
809764
{
810-
uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
811-
/* .................0 1 2 3 4 5 6 */
765+
uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
812766
struct segment min =
813767
suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
814-
EXPECT(min.start == 2);
815-
EXPECT(min.end == 7);
768+
EXPECT(min.start == 1);
769+
EXPECT(min.end == 10);
816770
}
817771

818772
static void test_suggest_compaction_segment_nothing(void)
@@ -923,6 +877,16 @@ static void test_empty_add(void)
923877
reftable_stack_destroy(st2);
924878
}
925879

880+
static int fastlog2(uint64_t sz)
881+
{
882+
int l = 0;
883+
if (sz == 0)
884+
return 0;
885+
for (; sz; sz /= 2)
886+
l++;
887+
return l - 1;
888+
}
889+
926890
static void test_reftable_stack_auto_compaction(void)
927891
{
928892
struct reftable_write_options cfg = {
@@ -1112,7 +1076,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
11121076
int stack_test_main(int argc, const char *argv[])
11131077
{
11141078
RUN_TEST(test_empty_add);
1115-
RUN_TEST(test_log2);
11161079
RUN_TEST(test_names_equal);
11171080
RUN_TEST(test_parse_names);
11181081
RUN_TEST(test_read_file);
@@ -1133,9 +1096,6 @@ int stack_test_main(int argc, const char *argv[])
11331096
RUN_TEST(test_reftable_stack_update_index_check);
11341097
RUN_TEST(test_reftable_stack_uptodate);
11351098
RUN_TEST(test_reftable_stack_validate_refname);
1136-
RUN_TEST(test_sizes_to_segments);
1137-
RUN_TEST(test_sizes_to_segments_all_equal);
1138-
RUN_TEST(test_sizes_to_segments_empty);
11391099
RUN_TEST(test_suggest_compaction_segment);
11401100
RUN_TEST(test_suggest_compaction_segment_nothing);
11411101
return 0;

t/t0610-reftable-basics.sh

Lines changed: 36 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' '
293293
test_line_count = 1 repo/.git/reftable/tables.list &&
294294
295295
test_commit -C repo --no-tag A &&
296-
test_line_count = 2 repo/.git/reftable/tables.list &&
296+
test_line_count = 1 repo/.git/reftable/tables.list &&
297297
298298
test_commit -C repo --no-tag B &&
299299
test_line_count = 1 repo/.git/reftable/tables.list
@@ -320,6 +320,19 @@ test_expect_success 'ref transaction: env var disables compaction' '
320320
test_line_count -lt $expected repo/.git/reftable/tables.list
321321
'
322322

323+
test_expect_success 'ref transaction: alternating table sizes are compacted' '
324+
test_when_finished "rm -rf repo" &&
325+
326+
git init repo &&
327+
test_commit -C repo A &&
328+
for i in $(test_seq 5)
329+
do
330+
git -C repo branch -f foo &&
331+
git -C repo branch -d foo || return 1
332+
done &&
333+
test_line_count = 2 repo/.git/reftable/tables.list
334+
'
335+
323336
check_fsync_events () {
324337
local trace="$1" &&
325338
shift &&
@@ -345,7 +358,7 @@ test_expect_success 'ref transaction: writes are synced' '
345358
git -C repo -c core.fsync=reference \
346359
-c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
347360
check_fsync_events trace2.txt <<-EOF
348-
"name":"hardware-flush","count":2
361+
"name":"hardware-flush","count":4
349362
EOF
350363
'
351364

@@ -377,7 +390,7 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail
377390
done ||
378391
exit 1
379392
done &&
380-
test_line_count = 13 .git/reftable/tables.list
393+
test_line_count = 10 .git/reftable/tables.list
381394
)
382395
'
383396

@@ -387,8 +400,8 @@ test_expect_success 'pack-refs: compacts tables' '
387400
388401
test_commit -C repo A &&
389402
ls -1 repo/.git/reftable >table-files &&
390-
test_line_count = 4 table-files &&
391-
test_line_count = 3 repo/.git/reftable/tables.list &&
403+
test_line_count = 3 table-files &&
404+
test_line_count = 2 repo/.git/reftable/tables.list &&
392405
393406
git -C repo pack-refs &&
394407
ls -1 repo/.git/reftable >table-files &&
@@ -429,7 +442,7 @@ test_expect_success "$command: auto compaction" '
429442
# The tables should have been auto-compacted, and thus auto
430443
# compaction should not have to do anything.
431444
ls -1 .git/reftable >tables-expect &&
432-
test_line_count = 4 tables-expect &&
445+
test_line_count = 3 tables-expect &&
433446
git $command --auto &&
434447
ls -1 .git/reftable >tables-actual &&
435448
test_cmp tables-expect tables-actual &&
@@ -447,7 +460,7 @@ test_expect_success "$command: auto compaction" '
447460
git branch B &&
448461
git branch C &&
449462
rm .git/reftable/*.lock &&
450-
test_line_count = 5 .git/reftable/tables.list &&
463+
test_line_count = 4 .git/reftable/tables.list &&
451464
452465
git $command --auto &&
453466
test_line_count = 1 .git/reftable/tables.list
@@ -479,7 +492,7 @@ do
479492
umask $umask &&
480493
git init --shared=true repo &&
481494
test_commit -C repo A &&
482-
test_line_count = 3 repo/.git/reftable/tables.list
495+
test_line_count = 2 repo/.git/reftable/tables.list
483496
) &&
484497
git -C repo pack-refs &&
485498
test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
@@ -847,26 +860,34 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' '
847860
test_when_finished "rm -rf repo worktree" &&
848861
git init repo &&
849862
test_commit -C repo A &&
863+
864+
GIT_TEST_REFTABLE_AUTOCOMPACTION=false \
850865
git -C repo worktree add ../worktree &&
866+
GIT_TEST_REFTABLE_AUTOCOMPACTION=false \
867+
git -C worktree update-ref refs/worktree/per-worktree HEAD &&
851868
852-
test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
853-
test_line_count = 4 repo/.git/reftable/tables.list &&
869+
test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
870+
test_line_count = 3 repo/.git/reftable/tables.list &&
854871
git -C repo pack-refs &&
855-
test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
872+
test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
856873
test_line_count = 1 repo/.git/reftable/tables.list
857874
'
858875

859876
test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
860877
test_when_finished "rm -rf repo worktree" &&
861878
git init repo &&
862879
test_commit -C repo A &&
880+
881+
GIT_TEST_REFTABLE_AUTOCOMPACTION=false \
863882
git -C repo worktree add ../worktree &&
883+
GIT_TEST_REFTABLE_AUTOCOMPACTION=false \
884+
git -C worktree update-ref refs/worktree/per-worktree HEAD &&
864885
865-
test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
866-
test_line_count = 4 repo/.git/reftable/tables.list &&
886+
test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
887+
test_line_count = 3 repo/.git/reftable/tables.list &&
867888
git -C worktree pack-refs &&
868889
test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
869-
test_line_count = 4 repo/.git/reftable/tables.list
890+
test_line_count = 3 repo/.git/reftable/tables.list
870891
'
871892

872893
test_expect_success 'worktree: creating shared ref updates main stack' '
@@ -880,6 +901,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' '
880901
test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
881902
test_line_count = 1 repo/.git/reftable/tables.list &&
882903
904+
GIT_TEST_REFTABLE_AUTOCOMPACTION=false \
883905
git -C worktree update-ref refs/heads/shared HEAD &&
884906
test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
885907
test_line_count = 2 repo/.git/reftable/tables.list

0 commit comments

Comments
 (0)