Skip to content

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Aug 20, 2025

Depends on #8863. Fixes #8855 and #8856.

Note: since this branches from 8863, if that branch gets force-pushed, this one will also need to be force-pushed. This may break your incremental review flow.

This PR makes the initial changes required for Nexus quiesce described by RFD 588:

  • Refactors the mechanics of Nexus quiesce to use a new NexusQuiesceHandle. This was so that that can be invoked from a background task.
  • Triggers quiesce from the blueprint_execution background task based on whether the current target blueprint says we should be handing off. (This is what fixes Reconfigurator: Start quiescing if target bp nexus_generation is greater than our nexus_generation #8855 and Nexus Boot: Start quiescing if target bp nexus_generation is greater than our nexus_generation #8856.) It does this even if blueprint execution is disabled.
  • Disallows the creation of new sagas until we figure out if we're quiesced or not. In practice, this should happen soon after Nexus startup. It depends on the blueprint_loader and then blueprint_execution background tasks running.
  • Changes the quiesce states to reflect what they will be when we finish RFD 588. "WaitingFor" is changed to "Draining" (e.g., WaitingForSagas -> DrainingSagas) and there's a new RecordingQuiesce state that will cover the period after we've determined we're quiesced and before we've written the database record saying so.
  • Makes it legal to re-assign sagas after becoming "locally drained" (as defined in RFD 588). Nexus instances could coordinate on saga quiesce #8796 explains why this is desirable. RFD 588 explains how this will eventually result in a safe handoff. However, this PR does not implement 8796 -- more below.

This PR does not change the quiesce process to coordinate among the Nexus instances (as described in RFD 588) nor update the db_metadata_nexus table. Both of these are blocked on #8845.

The net result is a little awkward:

  • There's this new RecordingQuiesce state that's basically unused (it is technically used, but we transition through it immediately)
  • Since this PR now allows saga reassignment after becoming locally drained, but does not wait for other Nexus instances to finish draining before entering db-quiesce, it is now possible to drain sagas fro this Nexus and have it enter db-quiesce and wind up assigning sagas to itself, which would then become implicitly abandoned.

I think this is okay as an intermediate state, even on "main", since this cannot be triggered except by an online update, and we know we're going to fix these as part of shipping that.

@davepacheco davepacheco requested a review from jgallagher August 20, 2025 23:16
@davepacheco
Copy link
Collaborator Author

I think this is almost ready for review except that I think I'm going to need to have the test suite wait for the saga quiesce determination to be made before running the test (a lot of tests probably assume they can just go ahead and run sagas right away) and I'm working through some issues trying to do that.

@smklein smklein force-pushed the nexus_generation branch 5 times, most recently from 8235551 to ecd6a00 Compare August 21, 2025 19:53
@davepacheco davepacheco force-pushed the dap/handoff-quiesce-1 branch from 161491a to 127d5a8 Compare August 21, 2025 22:38
@davepacheco
Copy link
Collaborator Author

As expected, I force-pushed this to sync up with #8863. I gather that branch will not be force-pushed again so I think this one should be stable, too.

I'm still working through testing.

@@ -5016,7 +5016,7 @@ async fn cmd_db_dns_diff(
// Load the added and removed items.
use nexus_db_schema::schema::dns_name::dsl;

let added = dsl::dns_name
let mut added = dsl::dns_name
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are so that omdb output doesn't change as a result of unspecified sort order (in this case, of DNS records). This became a problem after I added the "second" Nexus zone to the blueprint (see other comment).

+ test-suite-silo.sys A 127.0.0.1
+ test-suite-silo.sys (records: 2)
+ A 127.0.0.1
+ AAAA 100::1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra records here and in other outputs below result from having added a "second" Nexus to the test suite / omicron-dev blueprint (see other comment).

@@ -489,15 +493,21 @@ task: "nat_garbage_collector"

task: "blueprint_loader"
configured period: every <REDACTED_DURATION>m <REDACTED_DURATION>s
last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
last completed activation: <REDACTED ITERATIONS>, triggered by an explicit signal
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of the background tasks' output changed because previously there was no target blueprint when they had run. The changes in this PR make test suite startup block on the first blueprint having been loaded, which made a lot of these tasks find something to do.

started at <REDACTED_TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
last completion reported error: no blueprint
warning: unknown background task: "crdb_node_id_collector" (don't know how to interpret details: Object {"errors": Array [Object {"err": String("failed to fetch node ID for zone ..........<REDACTED_UUID>........... at http://[::1]:REDACTED_PORT: Communication Error: error sending request for url (http://[::1]:REDACTED_PORT/node/id): error sending request for url (http://[::1]:REDACTED_PORT/node/id): client error (Connect): tcp connect error: Connection refused (os error 146)"), "zone_id": String("..........<REDACTED_UUID>...........")}], "nsuccess": Number(0)})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This curious line seems to be more fallout from having a target blueprint loaded by the time omdb queried background task status. This presumably always produced this error in the test suite because the test suite doesn't start a cockroach admin server, but we didn't notice this here because there wasn't a target blueprint loaded yet so we didn't try to do anything.

@@ -251,6 +230,27 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
],
// This one should fail because it has no parent.
&["nexus", "blueprints", "diff", &initial_blueprint_id],
// chicken switches: show and set
&["nexus", "chicken-switches", "show", "current"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These got moved later in the sequence because they actually changed the output of omdb nexus blueprints show, which is also tested in this sequence. That had changed because setting this chicken switch enabled the planner, which then went and made a new blueprint.

for task in &[
&self.background_tasks.task_internal_dns_config,
&self.background_tasks.task_internal_dns_servers,
&self.background_tasks.task_external_dns_config,
&self.background_tasks.task_external_dns_servers,
&self.background_tasks.task_external_endpoints,
&self.background_tasks.task_inventory_collection,
&self.background_tasks.task_blueprint_loader,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is appropriate because we just inserted the first blueprint, so it makes sense to activate the blueprint loader so that it loads it.

This is important because now we're blocking saga enablement and so Nexus startup on having loaded and started executing the first blueprint. So without this, it could take quite a while for Nexus to notice there was a blueprint, enable sagas, and complete startup.

Comment on lines -306 to -309
// - each Nexus created for testing gets its own id so they don't see each
// others sagas and try to recover them
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is ancient and (I hope) wrong. I think we always create a fresh context with a fresh database and don't need to generate a new id every time.

Fixing this was important to ensure that the sort order for omdb nexus blueprints show was consistent when run in the context of the test suite. Otherwise, now that there are two Nexus zones, one of which has a fixed uuid, they could flip-flop in their order based on the uuid that got assigned here.

Comment on lines +843 to +850
// Besides the Nexus that we just started, add an entry in the blueprint
// for the Nexus that developers can start using
// nexus/examples/config-second.toml.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "second" Nexus instance that I mentioned in other comments here.

We have a useful dev flow for running a second Nexus instance against omicron-dev run-all. I use this a lot, including for this change. It's broken by this PR without this change because the new Nexus doesn't find itself in the blueprint and doesn't know if it should come up quiesced or not.

This fix is janky, but I think not more so than the rest of the test suite's startup behavior.

@davepacheco
Copy link
Collaborator Author

Besides CI, I've tested that the manual quiesce process works basically the same way that it did before, the same way I tested it at demo day (using omdb nexus sagas demo-create / demo-complete to control sagas and omdb nexus quiesce show / start to observe and control quiesce state). It's hard to do a fuller test until more of this is done but let me know if there are other tests folks think I should do.

I also tested that if Nexus can't figure out if it's quiesced, it reports that and disallows sagas. Before I fixed the test suite to include this second Nexus in its blueprint, the new Nexus would report:

Aug 21 19:04:01.375 ERRO blueprint execution: failed to determine if this Nexus is quiescing, error: zone a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f does not exist in blueprint, background_task: blueprint_executor, component: BackgroundTasks, component: nexus, component: ServerContext, name: a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f, file: nexus/src/app/background/tasks/blueprint_execution.rs:123

trying to query its quiesce state reports:

$ ./target/debug/omdb nexus --nexus-internal-url http://[::1]:12223/ quiesce show
note: using Nexus URL http://[::1]:12223/
has not yet determined if it is quiescing
sagas running: 0
database connections held: 0

$ ./target/debug/omdb -w nexus --nexus-internal-url http://[::1]:12223/ sagas demo-create
note: using Nexus URL http://[::1]:12223/
Error: creating demo saga

Caused by:
    Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "0a2e1c34-0b99-4f95-8dd7-d8ead852501a", "content-length": "133", "date": "Thu, 21 Aug 2025 19:03:24 GMT"}; value: Error { error_code: Some("ServiceNotAvailable"), message: "Service Unavailable", request_id: "0a2e1c34-0b99-4f95-8dd7-d8ead852501a" }

which was this:

Aug 21 19:03:24.104 INFO request completed, error_message_external: Service Unavailable, error_message_internal: saga creation is disallowed (unknown yet if we're quiescing), latency_us: 338, response_code: 503, uri: //demo-saga, method: POST, req_id: 0a2e1c34-0b99-4f95-8dd7-d8ead852501a, remote_addr: [::1]:42433, local_addr: [::1]:12223, component: dropshot_internal, name: a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f, file: /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.3/src/server.rs:855

and creating sagas fails as it should:


Another test that would be good to add is:

  • it quiesces when the blueprint says so
  • it comes up quiesced when the blueprint says so

@davepacheco davepacheco marked this pull request as ready for review August 21, 2025 23:13
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about landing this on main without the followup work to coordinate Nexuses because we do want to test online update off of main. But maybe in tests we're not likely to have all that many sagas anyway, and if we end up with implicitly abandoned ones it's similarly not that big of a deal?

Ok(saga_quiesce
.reassign_sagas(async || {
// For any expunged Nexus zones, re-assign in-progress
// sagas to some other Nexus. If this fails for some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this comment was here before, but can we refine "some other Nexus" to "ourself"?

*q = new_state;
true
}
_ => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand _ to list all the cases? I find it hard to reason about // All other cases are ... when they aren't listed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it in a previous iteration but I found it less clear. I can put that back and see what it looks like.

/// cannot then re-enable sagas.
pub fn set_quiescing(&self, quiescing: bool) {
self.inner.send_if_modified(|q| {
let new_state = if quiescing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - this is only used in the SagasAllowed::DisallowedUnknown arm below - can we move it into that arm?

);
false
}
_ => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this match would be clearer if we nested the ifs in the two branches above, which (I think?) would let us drop this _ entirely; e.g.,

SagasAllowed::Allowed => { 
    // If sagas are currently allowed but we need to quiesce, switch to disallowing them.
    if quiescing {
        // ... all the stuff from above ...
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what you mean. I think that would make it clearer what the current _ covers (though I tried to explain that with the comment). But I think it would be less clear what the bigger picture is. I think of this like: there are three interesting cases and they're represented by the non-_ arms here. With your change, two of the "interesting" cases are actually sub-branches of two of the arms, and the other two branches within those arms are the very same "uninteresting" case (quiesce state already matches what we're being asked to do)). I don't feel strongly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just really dislike _ and will take nearly any excuse to drop it. 😂

@davepacheco
Copy link
Collaborator Author

I'm a little nervous about landing this on main without the followup work to coordinate Nexuses because we do want to test online update off of main. But maybe in tests we're not likely to have all that many sagas anyway, and if we end up with implicitly abandoned ones it's similarly not that big of a deal?

I think it's even less bad than that. The problem introduced by this PR if we're testing online update is that if we get far enough into the update to start the handoff process and at some point before the handoff completes there's an expunged Nexus zone with sagas assigned to it, then we'd wind up with implicitly abandoned sagas. For this to happen, we must have expunged a Nexus outside the upgrade process with sagas assigned to it. That seems pretty unlikely in our testing, right?

@jgallagher
Copy link
Contributor

For this to happen, we must have expunged a Nexus outside the upgrade process with sagas assigned to it. That seems pretty unlikely in our testing, right?

Ahh yeah great point; thanks. I'm much less concerned now.

@davepacheco
Copy link
Collaborator Author

@jgallagher I applied the fix that I think you were suggesting for the failing test. Now we only add the "second" Nexus when running omicron-dev run-all. This did fix the test locally and I'm also able to run the "second Nexus" flow with omicron-dev run-all.

We might run into new problems soon? I'm not sure. With #8845 the new Nexus won't find its db_metadata_nexus record and so won't start up, but maybe that'll work okay because blueprint execution in the first Nexus will create that record. If this becomes a problem, I think we may want to look at creating a more explicit step in the flow where we properly add the Nexus to the system.

At the very least, this change allows this PR to not break things.

@davepacheco davepacheco force-pushed the dap/handoff-quiesce-1 branch from dc85999 to 6c84ded Compare September 2, 2025 16:23
@davepacheco
Copy link
Collaborator Author

I've just sync'd this up and I think this is ready for final review (which I think should be trivial, relative to the previous state). Unfortunately though I've had to force push this branch. That's because this PR was previously based on #8863. Since then, most of what it depended on was split out and landed on "main", while other parts are no longer needed and aren't on "main". To sync this up, I've rebased this change onto "main" and force-pushed this branch. Sorry, I know that sucks. But in terms of mitigating the impact of this, I have diff'd the diffs:

  • diff 1 ("before"): from the merge base of this PR branch and 8863's to the tip of this branch before rebasing. (More specifically: git diff 4d7235b54b97d51ac520bf661e6c2806b684eef8 dc85999ec41ceb93d879867b51a554fc96ead3e9)
  • diff 2 ("after"): from the merge base of this PR branch and main to the tip of this branch now. (More specifically: git diff cd38f25aafc93560877f7ec4aa8f21ec81488bd6 3ce8d59b0ca9cdd0723d135d8e764a82a3695f1a.)

I believe the only diffs here are what we'd expect: changes to "main" adjacent to the changes I was already making. Click below if you want to see the full diff:

$ diff diffs_{before,after}.txt
184c184
< index 82577f26f..a5cdb0c79 100644
---
> index f0a22c50a..2bd070b5d 100644
187,189c187
< @@ -489,15 +489,21 @@ task: "nat_garbage_collector"
<  
<  task: "blueprint_loader"
---
> @@ -491,13 +491,19 @@ task: "blueprint_loader"
191,192c189
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by an explicit signal
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
202,203c199
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
215c211
<    last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
233,235c229
< @@ -541,9 +558,9 @@ warning: unknown background task: "chicken_switches_watcher" (don't know how to
<  
<  task: "crdb_node_id_collector"
---
> @@ -543,7 +560,7 @@ task: "crdb_node_id_collector"
237,238c231
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
245,246c238
< @@ -844,16 +861,22 @@ stdout:
<  task: "blueprint_loader"
---
> @@ -844,13 +861,19 @@ task: "blueprint_loader"
248,250c240
<    currently executing: no
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by an explicit signal
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
260,262c250
<    currently executing: no
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
272,273c260
< @@ -1001,16 +1024,22 @@ task: "nat_garbage_collector"
<  task: "blueprint_loader"
---
> @@ -986,13 +1009,19 @@ task: "blueprint_loader"
275,277c262
<    currently executing: no
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by an explicit signal
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
287,289c272
<    currently executing: no
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
299,301c282,284
< @@ -1049,7 +1078,18 @@ task: "blueprint_rendezvous"
<    currently executing: no
<    last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
> @@ -1026,7 +1055,18 @@ task: "blueprint_rendezvous"
>    configured period: every <REDACTED_DURATION>m
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
319,320c302
< @@ -1061,9 +1101,9 @@ warning: unknown background task: "chicken_switches_watcher" (don't know how to
<  task: "crdb_node_id_collector"
---
> @@ -1038,7 +1078,7 @@ task: "crdb_node_id_collector"
322,324c304
<    currently executing: no
< -  last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
< +  last completed activation: <REDACTED ITERATIONS>, triggered by a dependent task completing
---
>    last completed activation: <REDACTED ITERATIONS>, triggered by <TRIGGERED_BY_REDACTED>
331c311
< @@ -1355,53 +1395,6 @@ task: "webhook_deliverator"
---
> @@ -1303,53 +1343,6 @@ task: "webhook_deliverator"
385c365
< @@ -1739,6 +1732,53 @@ stderr:
---
> @@ -1687,6 +1680,53 @@ stderr:
440c420
< index a46f21847..11db63156 100644
---
> index 8dcadc1cc..03b2f79a3 100644
443,446c423,426
< @@ -208,27 +208,6 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
<          &["nexus", "background-tasks", "show", "dns_internal"],
<          &["nexus", "background-tasks", "show", "dns_external"],
<          &["nexus", "background-tasks", "show", "all"],
---
> @@ -224,27 +224,6 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
>              "--no-executing-info",
>          ],
>          &["nexus", "background-tasks", "show", "all", "--no-executing-info"],
471c451
< @@ -251,6 +230,27 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
---
> @@ -267,6 +246,27 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
499c479
< @@ -264,7 +264,9 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
---
> @@ -280,7 +280,9 @@ async fn test_omdb_success_cases(cptestctx: &ControlPlaneTestContext) {
511c491
< index 7eb569ba3..22e84ae3f 100644
---
> index a72074887..fb62f1dd4 100644
526c506
< index 43c094855..a4f802b3c 100644
---
> index adfda2e59..e099f0c46 100644
529c509
< @@ -611,50 +611,37 @@ fn register_reassign_sagas_step<'a>(
---
> @@ -646,50 +646,37 @@ fn register_reassign_sagas_step<'a>(
611c591
< index 78f1ef566..495930b0b 100644
---
> index 51b6d1d86..afaf57f82 100644
631c611
< @@ -1029,6 +1030,8 @@ pub struct BackgroundTasksData {
---
> @@ -1028,6 +1029,8 @@ pub struct BackgroundTasksData {
784c764
< index 9ae27a227..752632e47 100644
---
> index d6519fdad..c33766e1e 100644
787c767
< @@ -278,18 +278,15 @@ impl BackgroundTask for BlueprintPlanner {
---
> @@ -273,18 +273,15 @@ impl BackgroundTask for BlueprintPlanner {
810c790
< @@ -429,7 +426,7 @@ mod test {
---
> @@ -423,7 +420,7 @@ mod test {
1282c1262
< index f84367946..be8d4762e 100644
---
> index 1a5a2847c..84b2bce45 100644
1307c1287
< @@ -840,47 +836,101 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
---
> @@ -835,47 +831,101 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
1427c1407
< @@ -891,11 +941,6 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
---
> @@ -886,11 +936,6 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
1439,1440c1419
< @@ -1621,6 +1666,7 @@ pub async fn omicron_dev_setup_with_config<N: NexusServer>(
<          sim::SimMode::Auto,
---
> @@ -1617,6 +1662,7 @@ pub async fn omicron_dev_setup_with_config<N: NexusServer>(
1443d1421
< +        true,
1444a1423
> +        true,
1447,1448c1426,1427
< @@ -1642,6 +1688,7 @@ pub async fn test_setup_with_config<N: NexusServer>(
<          sim_mode,
---
>  }
> @@ -1638,6 +1684,7 @@ pub async fn test_setup_with_config<N: NexusServer>(
1451d1429
< +        false,
1452a1431
> +        false,
1455,1456c1434,1435
< @@ -1653,6 +1700,7 @@ async fn setup_with_config_impl<N: NexusServer>(
<      sim_mode: sim::SimMode,
---
>  }
> @@ -1649,6 +1696,7 @@ async fn setup_with_config_impl<N: NexusServer>(
1459d1437
< +    second_nexus: bool,
1460a1439
> +    second_nexus: bool,
1463c1442,1443
< @@ -1796,6 +1844,20 @@ async fn setup_with_config_impl<N: NexusServer>(
---
>  
> @@ -1791,6 +1839,20 @@ async fn setup_with_config_impl<N: NexusServer>(
1642c1622
< index 7a2cca518..7f1abbb29 100644
---
> index 2c66a5f85..7d9c40b64 100644
1654c1634
< @@ -384,6 +386,26 @@ impl Blueprint {
---
> @@ -383,6 +385,26 @@ impl Blueprint {
1682c1662
< index f7db6d866..972a0b92d 100644
---
> index 49d744fc7..ab9a5922d 100644
1685c1665
< @@ -739,19 +739,28 @@ pub struct QuiesceStatus {
---
> @@ -996,19 +996,28 @@ pub struct QuiesceStatus {
1717c1697
< @@ -762,58 +771,51 @@ pub struct QuiesceStatus {
---
> @@ -1019,58 +1028,51 @@ pub struct QuiesceStatus {
2545c2525
< index bb98ae7c1..dc31e3d3c 100644
---
> index bc1cd3df5..caac20422 100644
2548c2528
< @@ -7524,8 +7524,23 @@
---
> @@ -7576,8 +7576,23 @@
2573c2553
< @@ -7542,7 +7557,7 @@
---
> @@ -7594,7 +7609,7 @@
2582c2562
< @@ -7560,7 +7575,7 @@
---
> @@ -7612,7 +7627,7 @@
2591c2571
< @@ -7570,13 +7585,13 @@
---
> @@ -7622,13 +7637,13 @@
2607c2587
< @@ -7585,14 +7600,50 @@
---
> @@ -7637,14 +7652,50 @@
2660c2640
< @@ -7608,13 +7659,16 @@
---
> @@ -7660,13 +7711,16 @@
2680c2660
< @@ -7627,9 +7681,10 @@
---
> @@ -7679,9 +7733,10 @@

After rebasing on "main", I had to make two additional changes related to a delta between what was in 8863 and what landed on "main" so far:

  • pulling in BlueprintBuilder::set_nexus_generation that was in 8863 but not the part that landed on "main". That's commit 6c84ded.
  • undoing a change to the caller to PlannerInput::assemble since the change to this function didn't go into "main"

@davepacheco davepacheco changed the base branch from nexus_generation to main September 2, 2025 16:32
@jgallagher jgallagher self-requested a review September 2, 2025 16:59

/// Given the current value of `nexus_generation`, set the new value for
/// this blueprint.
pub fn set_nexus_generation(&mut self, new_generation: Generation) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API would presumably allow a caller to move the generation backwards, correct?

I'm working through adding a similar API to #8936 right now, and wanted to make this hard to mis-use -- is there really any value we want to accept here other than self.nexus_generation.next()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to confirm - aside from tests, we're still waiting on #8936 to actually change the nexus generations, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working through adding a similar API to #8936 right now, and wanted to make this hard to mis-use -- is there really any value we want to accept here other than self.nexus_generation.next()?

A thing that has come up in the past is that reconfigurator-cli or unit tests may want to use the builder to intentionally set things up in unexpected / invalid ways to exercise test paths, and we have some methods that permit changes they could restrict to allow for that. We've also said if the change you want to make is really egregious, it's okay for the builder not to support that (which requires you to muck with blueprints by hand, usually).

I'm not sure which bucket this falls into. Is it worth testing what happens if the planner generation is behind one of its Nexus zones or more than one ahead, or are those impossible enough that they fall into the "really egregious" camp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to confirm - aside from tests, we're still waiting on #8936 to actually change the nexus generations, right?

Correct, in shipping code. The only thing in this PR that uses this is the new integration test in nexus/tests/integration_tests/quiesce.rs.

This API would presumably allow a caller to move the generation backwards, correct?

That's correct.

I'm working through adding a similar API to #8936 right now, and wanted to make this hard to mis-use -- is there really any value we want to accept here other than self.nexus_generation.next()?

There are a bunch of options here and none seems particularly better to me than the others. The planner is the only caller that we ever expect to exist in shipping code. It will always bump it by exactly one, and only in a very specific situation in which it has probably already checked the current nexus_generation. It'd be fine with any of these interfaces (set that lets you set it to anything; set that constrains it; or bump).

In general, I do think it's essential to be able to construct blueprints in reconfigurator-cli and for testing that do something different than the planner does. In this case, I'm not sure it would ever do something different than bump. I can imagine a support activity in which the handoff has gone badly and we want to disable the autoplanner and install a custom blueprint with the previous Nexus generation so that things can come back up. I wouldn't go out of our way for this case (it's a support activity; we can always do it by hand) but it's an advantage of a simple set.

It's more that I don't really see harm in allowing it to be set to a specific value at this low level. It's hard for me to see this being easily misused. I say this because it's not a required value in a struct, where you might try to fill in some default. You only call this function if you intend to set it to a specific value. I'm not sure what other value someone might think they need to set this to, or where they would have gotten it from. The only other plausible values I can think of are Generation::new() as a default (but then, why would you be calling this function only to supply a default value?) or the one from the parent blueprint (which would be a no-op).

I'm going to leave this for now but I'm fine with evolving this as we get more experience with callers!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I've tweaked #8936 to match this part of the builder. Matching this current interface seems fine.

.await
.expect("setting new target");

// Wait for Nexus to quiesce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's waiting for it to start quiescing; is that good enough, or should we be waiting for explicitly QuiesceState::Quiesced?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is specifically just for the behavior introduced in this PR that Nexus starts quiescing when the blueprint tells it to. For that, I don't think it's important to wait.

@davepacheco davepacheco enabled auto-merge (squash) September 3, 2025 00:19
@davepacheco davepacheco merged commit b112669 into main Sep 3, 2025
17 checks passed
@davepacheco davepacheco deleted the dap/handoff-quiesce-1 branch September 3, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconfigurator: Start quiescing if target bp nexus_generation is greater than our nexus_generation
3 participants