Skip to content

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Jan 7, 2026

Partial fix of #9594

// - Successful reads are validated inside the reader loop (must match
// the original blueprint exactly, or the assert_eq! fails)
// - "Not found" errors are expected after deletion
// - No other errors should occur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly (but pleasantly) surprised the "no other errors should occur" condition passes. I could imagine a case where we load from two tables then do some operation to match up rows, and bail out if we don't find a match. I think we do do that for pending MGS updates, but we're matching up against the effectively-immutable hw_baseboard_id table in that case, which doesn't get torn by blueprint deletes.

Would it be a problem if we got other kinds of errors? I suspect the error message would imply the blueprint was invalid in some way, which is technically true if it's a torn read but not very useful to the caller.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this - I think that we can roughly summarize the task of loading a blueprint as:

  • Read database rows (within the blueprint)
  • Read database row (top-level blueprint). Done last to avoid "tearing".
  • Parse blueprint from database rows, validate if it's correct (e.g., the "row-matching" logic you describe)

I dunno if we can do this in practice, but I'd really like to do those steps in that order. I think it's possible that we're doing some of the "parsing" work before we read the final top-level row.

If we can identify "the data from the database is invalid, skip all parsing", that would basically split the world into a "possibly-deleted" and "known-not-deleted" partitions - and we could do all the parsing after we determined that the rows don't belong to a deleted blueprint.

Not sure this PR is doing this perfectly, but the TL;DR of my push is:

  • Move database reads earlier
  • Move blueprint parsing later

@smklein smklein merged commit f4e6ee3 into main Jan 8, 2026
16 checks passed
@smklein smklein deleted the blueprint-read-order branch January 8, 2026 17:28
hawkw added a commit that referenced this pull request Jan 16, 2026
This commit fixes #9594 with regards to the sitrep load/delete
operations.[^1]

It makes the following changes:

1. Reorder the `fm_sitrep_read` query so that the sitrep metadata record
   is loaded _last_, and any loaded records are discarded should the
   metadata record no longer exist. This allows us to detect whether we
   have read a torn sitrep due to a concurrent delete
   (cda4f4d)
2. Change the `fm_sitrep_delete_all` query to use a transaction. The
   query is still a batched delete of multiple sitrep IDs, but this
   should be fine as the query does not `SELECT` the IDs to delete
   itself, and should therefore create a CRDB "write intent" only on
   the deleted rows. (1f0d6d9)
3. Some additional improvements to the `fm_sitrep_delete_all` query,
   adding a guard against deleting the current sitrep and changing the
   log level to INFO to match the similar blueprint/inventory delete
   queries.

   This isn't strictly necessary to fix #9594, but seemed worthwhile
   to do while I was here (0251720)
4. Add a knockoff version of @smklein's test for concurrent inventory
   deletes from PR #9604 that does the same thing except for sitreps
   (33547ed)
5. Change the `fm_sitrep_read_current` to correctly handle situations
   where it loads the current sitrep ID, and then, before it loads the
   sitrep records, a new sitrep is made current and the previous one is
   deleted (5f9b831)

[^1]: Which should be sufficient to close that issue, as #9603 and 
      #9604 already fixed the blueprint and inventory collection sides
      of the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants