Skip to content

Commit ed1ad6b

Browse files
pks-tgitster
authored andcommitted
reftable/stack: fix corruption on concurrent compaction
The locking employed by compaction uses the following schema: 1. Lock "tables.list" and verify that it matches the version we have loaded in core. 2. Lock each of the tables in the user-supplied range of tables that we are supposed to compact. These locks prohibit any concurrent process to compact those tables while we are doing that. 3. Unlock "tables.list". This enables concurrent processes to add new tables to the stack, but also allows them to compact tables outside of the range of tables that we have locked. 4. Perform the compaction. 5. Lock "tables.list" again. 6. Move the compacted table into place. 7. Write the new order of tables, including the compacted table, into the lockfile. 8. Commit the lockfile into place. Letting concurrent processes modify the "tables.list" file while we are doing the compaction is very much part of the design and thus expected. After all, it may take some time to compact tables in the case where we are compacting a lot of very large tables. But there is a bug in the code. Suppose we have two processes which are compacting two slices of the table. Given that we lock each of the tables before compacting them, we know that the slices must be disjunct from each other. But regardless of that, compaction performed by one process will always impact what the other process needs to write to the "tables.list" file. Right now, we do not check whether the "tables.list" has been changed after we have locked it for the second time in (5). This has the consequence that we will always commit the old, cached in-core tables to disk without paying to respect what the other process has written. This scenario would then lead to data loss and corruption. This can even happen in the simpler case of one compacting process and one writing process. The newly-appended table by the writing process would get discarded by the compacting process because it never sees the new table. Fix this bug by re-checking whether our stack is still up to date after locking for the second time. If it isn't, then we adjust the indices of tables to replace in the updated stack. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 128b9aa commit ed1ad6b

File tree

1 file changed

+102
-5
lines changed

1 file changed

+102
-5
lines changed

reftable/stack.c

Lines changed: 102 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1020,7 +1020,9 @@ static int stack_compact_range(struct reftable_stack *st,
10201020
struct lock_file *table_locks = NULL;
10211021
struct tempfile *new_table = NULL;
10221022
int is_empty_table = 0, err = 0;
1023+
size_t first_to_replace, last_to_replace;
10231024
size_t i, nlocks = 0;
1025+
char **names = NULL;
10241026

10251027
if (first > last || (!expiry && first == last)) {
10261028
err = 0;
@@ -1123,6 +1125,100 @@ static int stack_compact_range(struct reftable_stack *st,
11231125
}
11241126
}
11251127

1128+
/*
1129+
* As we have unlocked the stack while compacting our slice of tables
1130+
* it may have happened that a concurrently running process has updated
1131+
* the stack while we were compacting. In that case, we need to check
1132+
* whether the tables that we have just compacted still exist in the
1133+
* stack in the exact same order as we have compacted them.
1134+
*
1135+
* If they do exist, then it is fine to continue and replace those
1136+
* tables with our compacted version. If they don't, then we need to
1137+
* abort.
1138+
*/
1139+
err = stack_uptodate(st);
1140+
if (err < 0)
1141+
goto done;
1142+
if (err > 0) {
1143+
ssize_t new_offset = -1;
1144+
int fd;
1145+
1146+
fd = open(st->list_file, O_RDONLY);
1147+
if (fd < 0) {
1148+
err = REFTABLE_IO_ERROR;
1149+
goto done;
1150+
}
1151+
1152+
err = fd_read_lines(fd, &names);
1153+
close(fd);
1154+
if (err < 0)
1155+
goto done;
1156+
1157+
/*
1158+
* Search for the offset of the first table that we have
1159+
* compacted in the updated "tables.list" file.
1160+
*/
1161+
for (size_t i = 0; names[i]; i++) {
1162+
if (strcmp(names[i], st->readers[first]->name))
1163+
continue;
1164+
1165+
/*
1166+
* We have found the first entry. Verify that all the
1167+
* subsequent tables we have compacted still exist in
1168+
* the modified stack in the exact same order as we
1169+
* have compacted them.
1170+
*/
1171+
for (size_t j = 1; j < last - first + 1; j++) {
1172+
const char *old = first + j < st->merged->stack_len ?
1173+
st->readers[first + j]->name : NULL;
1174+
const char *new = names[i + j];
1175+
1176+
/*
1177+
* If some entries are missing or in case the tables
1178+
* have changed then we need to bail out. Again, this
1179+
* shouldn't ever happen because we have locked the
1180+
* tables we are compacting.
1181+
*/
1182+
if (!old || !new || strcmp(old, new)) {
1183+
err = REFTABLE_OUTDATED_ERROR;
1184+
goto done;
1185+
}
1186+
}
1187+
1188+
new_offset = i;
1189+
break;
1190+
}
1191+
1192+
/*
1193+
* In case we didn't find our compacted tables in the stack we
1194+
* need to bail out. In theory, this should have never happened
1195+
* because we locked the tables we are compacting.
1196+
*/
1197+
if (new_offset < 0) {
1198+
err = REFTABLE_OUTDATED_ERROR;
1199+
goto done;
1200+
}
1201+
1202+
/*
1203+
* We have found the new range that we want to replace, so
1204+
* let's update the range of tables that we want to replace.
1205+
*/
1206+
first_to_replace = new_offset;
1207+
last_to_replace = last + (new_offset - first);
1208+
} else {
1209+
/*
1210+
* `fd_read_lines()` uses a `NULL` sentinel to indicate that
1211+
* the array is at its end. As we use `free_names()` to free
1212+
* the array, we need to include this sentinel value here and
1213+
* thus have to allocate `stack_len + 1` many entries.
1214+
*/
1215+
REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
1216+
for (size_t i = 0; i < st->merged->stack_len; i++)
1217+
names[i] = xstrdup(st->readers[i]->name);
1218+
first_to_replace = first;
1219+
last_to_replace = last;
1220+
}
1221+
11261222
/*
11271223
* If the resulting compacted table is not empty, then we need to move
11281224
* it into place now.
@@ -1145,12 +1241,12 @@ static int stack_compact_range(struct reftable_stack *st,
11451241
* have just written. In case the compacted table became empty we
11461242
* simply skip writing it.
11471243
*/
1148-
for (i = 0; i < first; i++)
1149-
strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
1244+
for (i = 0; i < first_to_replace; i++)
1245+
strbuf_addf(&tables_list_buf, "%s\n", names[i]);
11501246
if (!is_empty_table)
11511247
strbuf_addf(&tables_list_buf, "%s\n", new_table_name.buf);
1152-
for (i = last + 1; i < st->merged->stack_len; i++)
1153-
strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
1248+
for (i = last_to_replace + 1; names[i]; i++)
1249+
strbuf_addf(&tables_list_buf, "%s\n", names[i]);
11541250

11551251
err = write_in_full(get_lock_file_fd(&tables_list_lock),
11561252
tables_list_buf.buf, tables_list_buf.len);
@@ -1203,9 +1299,10 @@ static int stack_compact_range(struct reftable_stack *st,
12031299
delete_tempfile(&new_table);
12041300
strbuf_release(&new_table_name);
12051301
strbuf_release(&new_table_path);
1206-
12071302
strbuf_release(&tables_list_buf);
12081303
strbuf_release(&table_name);
1304+
free_names(names);
1305+
12091306
return err;
12101307
}
12111308

0 commit comments

Comments
 (0)