refactor(indexer-alt): CancellationToken -> Services #24503

amnn · 2025-12-03T12:17:29Z

Description

Make use of the new Services abstraction to avoid handling cancellation using CancellationToken, which often requires threading the token deep into each service.

This change also improves graceful shutdown support (for example when there is a metrics service running alongside the indexer, and the indexer triggers a shutdown, the metrics service is allowed time to drain connections before shutting down itself).

Test plan

CI

Stack

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

Protocol:
Nodes (Validators and Full nodes):
gRPC:
JSON-RPC:
GraphQL:
CLI:
Rust SDK:
Indexing Framework: The indexer, ingestion service, and metrics service all now return a Service instead of a JoinHandle<()> when run. Use Service::main to wait for the service to exit cleanly or with an error, or respond to a termination signal with a graceful shutdown. Service also exposes run, run_with_grace, join, and shutdown functions to customise various aspects of the shutdown process.

vercel · 2025-12-03T12:17:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
sui-docs	Ready	Preview	Comment	Dec 3, 2025 1:19pm

2 Skipped Deployments

Project	Deployment	Preview	Comments	Updated (UTC)
multisig-toolkit	Ignored	Preview		Dec 3, 2025 1:19pm
sui-kiosk	Ignored	Preview		Dec 3, 2025 1:19pm

amnn · 2025-12-03T12:41:00Z

crates/sui-checkpoint-blob-indexer/src/main.rs

-            // Job finished all work successfully
-            Ok(())
-        }
-        ExitReason::UserInterrupt => {


This behaviour has changed slightly -- Service::main doesn't distinguish between a user interruption and a termination signal -- they both come across as a termination, which will produce an exit code of 1 if the job was bounded.

@nickvikeras -- want to check that this is ok?

amnn · 2025-12-03T12:44:55Z

crates/sui-indexer-alt-consistent-store/src/restore/mod.rs

-
-    #[error(transparent)]
-    Err(#[from] anyhow::Error),
+pub struct Finalizer {


Not entirely happy with this -- the restore task needs to do some clean-up work at the end assuming the main service exits completely cleanly. I implemented this by having the run task output a Service for the main service, and then this Finalizer which can be run as its own service, when the main service completes successfully.

Originally, I tried to add finalization to the Service abstraction, but it doesn't compose well -- in particular with finalization usually you want to say things like "once this set of tasks completes successfully, run this task", but the way we merge services together can cause a finalizer task to end up waiting arbitrarily long after the tasks it cares about have completed to run, which is usually not desired.

amnn · 2025-12-03T12:45:43Z

crates/sui-indexer-alt-framework/src/ingestion/regulator.rs

Straggler from before.

amnn · 2025-12-03T12:50:13Z

crates/sui-indexer-alt-framework/src/pipeline/concurrent/commit_watermark.rs

It's not clear from this change, but previously the commit watermark task would run one more time after it was told to exit, to ensure that any pending watermark was flushed.

Now, that will only happen if the service is shutting down gracefully (e.g. not because of a termination signal, error, or interrupt).

@nickvikeras, I'm assuming that's still okay for the checkpoint indexer? It means that if the indexer fails or is told to shutdown prematurely, it won't try to flush any pending watermarks, but if it is winding down naturally after having processed a finite range of checkpoints, it will (previously when we were using cancellation tokens, we couldn't distinguish between these cases, so we sometimes received a cancellation signal as part of a normal shutdown).

The reason I had to make this change is that sometimes the indexer is running in the same service as the database, which means that when it has been told to shutdown, the database has also been told to shutdown, which means that if it tries to connect to the database, it will end up hanging, and will eventually timeout (this happens when starting a local network as part of the CLI).

## Description Avoid using `CancellationToken` to detect when the service is being shutdown while yielded. When using the `Service` abstraction, explicitly waiting on a cancellation token is not necessary, as the yield will be within the scope of a task maintained by the service which can be aborted. Also make use of `Service` for the system package task. ## Test plan This change breaks the build overall, but the crate builds and tests: ``` sui-indexer-alt-reader$ cargo check sui-indexer-alt-reader$ cargo nextest run ```

## Description Switch the metrics service to using the new `Service` abstraction, and away from using `CancellationToken`. ## Test plan Build, Test, Lints for `sui-indexer-alt-metrics` crate all succeed (but this change does break the build across the mono-repo.

## Description Switch JSONRPC-alt to using the new `Service` abstraction, and away from using `CancellationToken`. ## Test plan Build, Test, Lints for `sui-indexer-alt-jsonrpc` crate all succeed (but this change dooes break the build across the mono-repo).

## Description Switch GraphQL-alt to using the new `Service` abstraction, and away from using `CancellationToken`. As part of this change, Chain Identifier initialization was adapted. Previously, the chain identifier task ran before RPC started, which complicated lifecycles, because we needed to run that task to completion to get a value before the service could start, which also meant that we needed to handle termination signals for this initialization task as well as for the main service. Now, the chain identifier task is a secondary service that writes the chain identifier to a `SetOnce` when it's done. This can be started alongside the RPC, and the requests that need this information will wait for it to be set before they can continue (but the RPC as a whole can start and serve other requests before then, and termination handling can remain in one place). ## Test plan Build, test and lint `sui-indexer-alt-graphql`. Also run the service locally and ensure that it exits cleanly when asked to shutdown, or when forcefully aborted.

## Description Use the `Service` abstraction for composing parts of the Indexing framework together. Most constituent services are abortable services, with the exception of the concurrent committer which uses graceful shutdown to ensure there's one more opportunity to write the a committer watermark, when shutting down. ## Test plan Existing tests.

## Description In a recent change the commit watermark was made to perform an extra iteration before shutting down, however this can cause stalls during shutdown if the task fails to connect to its store. This situation occurs in calls to `sui start` where the CLI is responsible for running the database process as a child as well as the RPC services. The database is wound down by the `Ctrl-C` signal, while the RPC services are gracefully shutting down. The change was originally made to cater to the checkpoint indexer (or any use of the object store) where watermark updates are infrequent, so it was useful to flush the watermarks before shutting down in response to a cancellation. However, at the time of that change, the cancellation token was responsible for signaling "normal shutdown" (e.g. after having processed all checkpoints), as well as interrupts (unexpected termination). The hope is that by using separate methods to communicate these two forms of exit, we can avoid adding this graceful shutdown support. ## Test plan The `sui` CLI no longer hangs when made to wait for the RPC services to gracefully shutdown.

## Description Migrate the consistent store crate to using the `Service` abstraction, away from using cancellation tokens. This is mostly a straightforward conversion, with the exception of the finalization process after restoration, which needs to run after the main restoration service has exited successfully. Initially, I tried adding finalization as a concept to the `Service` abstraction, but it was not a good fit, because an individual service's finalizers would end up dependent on the success of unrelated service's primary tasks after merging. Instead, it was easier to have the restoration task return a finalizer that can be run to get its own service, which will run if the main service finishes successfully. ## Test plan Crate's own build, tests, lints.

## Description Make use of the `Service` abstraction to support graceful shutdown for RPC services in localnet. ## Test plan The following exits promptly: ``` $ cargo run --bin sui -- start --force-regenesis --with-graphql ^C ```

amnn self-assigned this Dec 3, 2025

amnn requested a review from a team as a code owner December 3, 2025 12:17

amnn requested review from emmazzz, evan-wall-mysten, henryachen, nickvikeras, tpham-mysten and wlmyng December 3, 2025 12:17

amnn temporarily deployed to sui-typescript-aws-kms-test-env December 3, 2025 12:17 — with GitHub Actions Inactive

This was referenced Dec 3, 2025

chore(reader): demote SystemPackageTask error! logs to warn!-s #24504

Open

refactor(indexer-alt): update examples to use new Service abstraction #24505

Open

amnn commented Dec 3, 2025

View reviewed changes

amnn added 12 commits December 3, 2025 13:14

wip(refactor(indexer-alt)): CancellationToken -> Service

c5fe911

wip(refactor(indexer-e2e)): CancellationToken -> Service

3f9ea99

wip(refactor(sui)): sui start RPC graceful shutdown

de5a0ae

## Description Make use of the `Service` abstraction to support graceful shutdown for RPC services in localnet. ## Test plan The following exits promptly: ``` $ cargo run --bin sui -- start --force-regenesis --with-graphql ^C ```

wip(refactor(checkpoint-indexer)): CancellationToken -> Service

f0d3f11

wip(refactor(bridge-indexer)): CancellationToken -> Service

465b6a2

amnn force-pushed the amnn/futures branch from 37a52e1 to a2c1b4e Compare December 3, 2025 13:17

amnn force-pushed the amnn/idx-services branch from 3f0ca29 to 465b6a2 Compare December 3, 2025 13:17

amnn temporarily deployed to sui-typescript-aws-kms-test-env December 3, 2025 13:17 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 3, 2025 13:19 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(indexer-alt): CancellationToken -> Services #24503

refactor(indexer-alt): CancellationToken -> Services #24503

amnn commented Dec 3, 2025 •

edited

Loading

Uh oh!

vercel bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

amnn Dec 3, 2025

Uh oh!

amnn Dec 3, 2025

Uh oh!

amnn Dec 3, 2025

Uh oh!

amnn Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor(indexer-alt): CancellationToken -> Services #24503

Are you sure you want to change the base?

refactor(indexer-alt): CancellationToken -> Services #24503

Conversation

amnn commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test plan

Stack

Release notes

Uh oh!

vercel bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amnn Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

amnn Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

amnn Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

amnn Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amnn commented Dec 3, 2025 •

edited

Loading

vercel bot commented Dec 3, 2025 •

edited

Loading