-
Notifications
You must be signed in to change notification settings - Fork 67
Reorder blueprint read to the end of loading #9603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // - Successful reads are validated inside the reader loop (must match | ||
| // the original blueprint exactly, or the assert_eq! fails) | ||
| // - "Not found" errors are expected after deletion | ||
| // - No other errors should occur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mildly (but pleasantly) surprised the "no other errors should occur" condition passes. I could imagine a case where we load from two tables then do some operation to match up rows, and bail out if we don't find a match. I think we do do that for pending MGS updates, but we're matching up against the effectively-immutable hw_baseboard_id table in that case, which doesn't get torn by blueprint deletes.
Would it be a problem if we got other kinds of errors? I suspect the error message would imply the blueprint was invalid in some way, which is technically true if it's a torn read but not very useful to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this - I think that we can roughly summarize the task of loading a blueprint as:
- Read database rows (within the blueprint)
- Read database row (top-level blueprint). Done last to avoid "tearing".
- Parse blueprint from database rows, validate if it's correct (e.g., the "row-matching" logic you describe)
I dunno if we can do this in practice, but I'd really like to do those steps in that order. I think it's possible that we're doing some of the "parsing" work before we read the final top-level row.
If we can identify "the data from the database is invalid, skip all parsing", that would basically split the world into a "possibly-deleted" and "known-not-deleted" partitions - and we could do all the parsing after we determined that the rows don't belong to a deleted blueprint.
Not sure this PR is doing this perfectly, but the TL;DR of my push is:
- Move database reads earlier
- Move blueprint parsing later
This commit fixes #9594 with regards to the sitrep load/delete operations.[^1] It makes the following changes: 1. Reorder the `fm_sitrep_read` query so that the sitrep metadata record is loaded _last_, and any loaded records are discarded should the metadata record no longer exist. This allows us to detect whether we have read a torn sitrep due to a concurrent delete (cda4f4d) 2. Change the `fm_sitrep_delete_all` query to use a transaction. The query is still a batched delete of multiple sitrep IDs, but this should be fine as the query does not `SELECT` the IDs to delete itself, and should therefore create a CRDB "write intent" only on the deleted rows. (1f0d6d9) 3. Some additional improvements to the `fm_sitrep_delete_all` query, adding a guard against deleting the current sitrep and changing the log level to INFO to match the similar blueprint/inventory delete queries. This isn't strictly necessary to fix #9594, but seemed worthwhile to do while I was here (0251720) 4. Add a knockoff version of @smklein's test for concurrent inventory deletes from PR #9604 that does the same thing except for sitreps (33547ed) 5. Change the `fm_sitrep_read_current` to correctly handle situations where it loads the current sitrep ID, and then, before it loads the sitrep records, a new sitrep is made current and the previous one is deleted (5f9b831) [^1]: Which should be sufficient to close that issue, as #9603 and #9604 already fixed the blueprint and inventory collection sides of the issue.
Partial fix of #9594