Commit d7290d7
authored
Fix some missing durable writes in CheckpointBuilder (#5080)
This hopefully fixes a crash we're seeing in the field, with a signature
like: `std::runtime_error("Corrupt checkpoint file
/var/lib/stellar/buckets/history/ledger/ledger-0398c97f.xdr.dirty, ends
on ledger 60344665, LCL is 60344667")`
The fix is to change a couple `writeOne` calls to `durableWriteOne`
calls in the `CheckpointBuilder`. As far as I can tell their absence was
just an oversight during the multiple iterations of development of the
original `CheckpointBuilder` PR #4446 -- initially the PR didn't have
`durableWriteOne`, and then later after some discussion about durability
guarantees it gained that path, but only 2 of the 4 cases of `writeOne`
in the code got updated to use it.
By not doing durable writes here, it's possible for core to lose a
suffix of a dirty checkpoint file, which in turn can violate the safety
invariant that the dirty checkpoint files are always _ahead_ of the
sqlite-committed LCL. The `CheckpointBuilder` detects this invariant
violation on startup and crashes with the message above.
Why this started to manifest only in v25 is a _bit_ unclear, but there
are a few possibilities. As a statistical sort of thing, the machine
that failed in the field might just have got unlucky: its OS failed to
write a buffer either on a crash or other non-graceful shutdown
condition. Alternatively it _might_ be due to the fact that we removed
some other unrelated fsync calls in general recently, for example in
#4952, which might have coalesced with the flushed-but-not-fsync'ed
`CheckpointBuilder` files. It's hard to be sure about anything with
fsync! But I think this change ought to at least improve the odds of
`CheckpointBuilder` maintaining its invariants.
(The removed `flush` call after the previous `writeOne` is intentional
-- flushing the stream buffer happens as part of `durableWriteOne`)1 file changed
+2
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
173 | | - | |
174 | | - | |
| 173 | + | |
175 | 174 | | |
176 | 175 | | |
177 | 176 | | |
| |||
289 | 288 | | |
290 | 289 | | |
291 | 290 | | |
292 | | - | |
| 291 | + | |
293 | 292 | | |
294 | 293 | | |
295 | 294 | | |
| |||
0 commit comments