docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan by pratikmankawde · Pull Request #6643 · XRPLF/rippled

pratikmankawde · 2026-03-25T11:16:59Z

Github PR doesn't format this doc very well. So, feel free to read it on Github branch page.

High Level Overview of Change

Adds a comprehensive migration plan document (BoostToStdCoroutineSwitchPlan.md) for switching rippled from Boost.Coroutine2 (stackful) to C++20 standard coroutines (stackless).

This is PR 0 in the StdCoroutineSwitch chain — it contains only the plan document, no code changes.

PR Chain

Implementation is in draft mode. Still evolving and being reviewed.

PR	Branch	Description
#6643 (this)	`pratik/Swtich-to-std-coroutines` → `develop`	Migration plan document
#6421	`pratik/std-coro/add-coroutine-primitives` → `develop`	CoroTask, CoroTaskRunner, JobQueueAwaiter primitives
#6423	`pratik/std-coro/migrate-entry-points` → `add-coroutine-primitives`	Migrate HTTP/WS/gRPC entry points to postCoroTask()
#6428	`pratik/std-coro/migrate-test-code` → `migrate-entry-points`	Migrate coroutine tests to C++20 API
#6429	`pratik/std-coro/cleanup-boost-coroutine` → `migrate-test-code`	Remove Boost.Coroutine dependency and old Coro API
#6525	`pratik/std-coro/tsan-fixes` → `cleanup-boost-coroutine`	TSAN data-race fixes for CoroTaskRunner

Context of Change

The plan covers:

Research & viability analysis — why C++20 stackless coroutines work for rippled's shallow yield pattern
Current state audit — all coroutine touchpoints, JobQueue::Coro internals, entry points, handlers
Migration strategy — incremental 4-phase approach with coexistence period
Implementation design — CoroTask<T>, JobQueueAwaiter, CoroTaskRunner, API mapping
Testing & validation — unit tests, sanitizer testing (ASAN/TSAN), benchmarks, regression methodology
Risks & mitigation — risk matrix, rollback strategy, stackful→stackless limitation analysis
Standards & guidelines — coroutine design rules, thread safety, naming conventions, code review checklist

Known Plan-vs-Implementation Divergences

The following aspects evolved during implementation and differ from the plan:

RipplePathFind handler — plan proposed co_await-based migration; actual implementation uses std::condition_variable synchronous blocking (simpler, no coroutine needed since only one handler suspends)
RPC::Context::coro field — plan proposed replacing the type; actual implementation removed it entirely
CoroTaskRunner — not in original plan; emerged as needed lifecycle manager wrapping CoroTask<void>
yieldAndPost() API — added to work around GCC-12 compiler bug with external awaiters at multiple co_await points
Boost::context retained — Section 1.7 claims the migration will "Remove external dependency on Boost.Coroutine (and transitively Boost.Context)". This is incorrect: Boost::context cannot be removed because boost::asio::spawn (used by yield_to.h test infra and Spawn.h server code) still depends on it. The cleanup branch correctly replaced Boost::coroutine → Boost::context in CMake and added BOOST_USE_ASAN/BOOST_USE_TSAN/BOOST_USE_UCONTEXT defines for sanitizer fiber-switching annotations — none of which is discussed in the plan.
GCC-14 reference-capture bug — discovered during implementation, documented in code

API Impact

Public API: New feature (new methods and/or new fields)
Public API: Breaking change (in general, breaking changes should only impact the next api_version)
libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
Peer protocol change (must be backward compatible or bump the peer protocol version)

xrplf-ai-reviewer

Three issues flagged inline: a year typo in the document header, and two design gaps in the gRPC migration path — CallData lifetime analysis is missing (potential use-after-free), and ServerContext cancellation propagation is unaddressed for suspended coroutines.

Review by Claude Opus 4.6 · Prompt: V12

xrplf-ai-reviewer · 2026-03-25T15:07:51Z

BoostToStdCoroutineSwitchPlan.md

+
+> **Status:** Implementation Complete
+> **Author:** Pratik Mankawde
+> **Created:** 2026-02-25


Typo in year — should be 2025-02-25:

Suggested change

> **Created:** 2026-02-25

> **Created:** 2025-02-25

I work in the future, not in the past!

xrplf-ai-reviewer · 2026-03-25T15:07:52Z

BoostToStdCoroutineSwitchPlan.md

+
+- A client (e.g., a wallet app) sends an RPC request to the rippled server.
+- The server wraps the request in a coroutine and schedules it on a worker thread from the JobQueue.
+- The handler processes the request. Most handlers finish immediately and return a response.


Plan gap: CallData ownership chain under C++20 not analyzed — potential use-after-free risk.

In the Boost model, shared_ptr<Coro> inside the lambda ensures CallData outlives the coroutine. With C++20, if the gRPC completion queue fires and destroys CallData while the coroutine frame still holds a reference (via RPC::Context), this is a use-after-free — the exact dangling reference risk from Concern 5, but unaddressed for the gRPC code path.

Suggested addition in Milestone 2, task 2.3: Explicitly audit CallData object lifetime relative to the CoroTaskRunner frame. Ensure CallData is kept alive (e.g., via shared_from_this() or explicit capture) for the full coroutine duration. Add a TSAN/ASAN test specifically for gRPC request lifetime.

See: gRPC

xrplf-ai-reviewer · 2026-03-25T15:07:52Z

BoostToStdCoroutineSwitchPlan.md

+e.g. doRipplePathFind`"]
+        YIELD["`**coro.yield()**
+Suspends execution`"]
+        RESUME["`**coro.post()**


Plan gap: gRPC ServerContext cancellation propagation not addressed in the migration.

When CallData::process() is migrated to postCoroTask() in Phase 2 (task 2.3), there is no discussion of what happens if the gRPC client disconnects or times out while the coroutine is suspended (e.g., during pathfinding). The coroutine will resume on the JobQueue with no awareness of cancellation — wasting resources and potentially writing to a dead stream.

Suggested addition in Phase 2, task 2.3: Document whether grpc::ServerContext* is threaded through RPC::Context. If so, add a cancellation check in JobQueueAwaiter::await_suspend() or at the co_await resume point: if grpc_context->IsCancelled(), return codes.Canceled rather than continuing into the handler body.

See: gRPC

xrplf-ai-reviewer

Two off-by-one bugs in BUILD.md bash loops and a likely year typo in the plan doc — see inline comments.

Review by Claude Opus 4.6 · Prompt: V12

xrplf-ai-reviewer · 2026-03-25T15:13:36Z

BUILD.md

-for recipe in "${recipes[@]}"; do
-  echo "Checking out recipe '${recipe}'..."
-  git sparse-checkout add recipes/${recipe}
+for ((index = 1; index <= ${#recipes[@]}; index++)); do


Off-by-one: Bash arrays are 0-indexed, so this skips recipes[0] (ed25519) and reads an out-of-bounds element at the end. Fix the loop bounds:

Suggested change

for ((index = 1; index <= ${#recipes[@]}; index++)); do

for ((index = 0; index < ${#recipes[@]}; index++)); do

xrplf-ai-reviewer · 2026-03-25T15:13:37Z

BUILD.md

+cd ../..
+
+# Export the recipes into the local cache.
+for ((index = 1; index <= ${#recipes[@]}; index++)); do


Same off-by-one bug as line 161 — skips first recipe and reads past the end of the array.

Suggested change

for ((index = 1; index <= ${#recipes[@]}; index++)); do

for ((index = 0; index < ${#recipes[@]}; index++)); do

xrplf-ai-reviewer · 2026-03-25T15:13:37Z

BoostToStdCoroutineSwitchPlan.md

+
+> **Status:** Implementation Complete
+> **Author:** Pratik Mankawde
+> **Created:** 2026-02-25


Year looks like a typo — should this be 2025-02-25?

Suggested change

> **Created:** 2026-02-25

> **Created:** 2025-02-25

Comprehensive migration plan documenting the switch from Boost.Coroutine2 to C++20 standard coroutines in rippled, including research analysis, implementation phases, risk assessment, and testing strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

xrplf-ai-reviewer

Five issues flagged inline: a year typo in the document header, and four architectural gaps in the gRPC migration coverage — missing streaming RPC audit, missing CompletionQueue lifecycle analysis, missing shutdown handling in task 2.3, and a high-severity regression where the FAQ documents a 30-second thread-blocking synchronous wait that contradicts the migration's core goals.

Review by Claude Opus 4.6 · Prompt: V12

xrplf-ai-reviewer · 2026-03-25T15:54:06Z

BoostToStdCoroutineSwitchPlan.md

+
+> **Status:** Implementation Complete
+> **Author:** Pratik Mankawde
+> **Created:** 2026-02-25


Typo in year — should be 2025, not 2026:

Suggested change

> **Created:** 2026-02-25

> **Created:** 2025-02-25

xrplf-ai-reviewer · 2026-03-25T15:54:06Z

BoostToStdCoroutineSwitchPlan.md

+| `coroutine<void>::push_type`                             | `JobQueue.h:53` | Yield function type         |
+| `boost::context::protected_fixedsize_stack(1536 * 1024)` | `Coro.ipp:14`   | Stack size configuration    |
+| `#include <boost/coroutine2/all.hpp>`                    | `JobQueue.h:11` | Header inclusion            |
+


Section 5.4 lists only the unary gRPC entry point — streaming RPC handlers are not audited. Add a paragraph to Section 5.4 confirming: (a) which rippled proto methods are unary vs streaming, (b) whether any streaming handler calls postCoro() or yield(), and (c) whether streaming handlers use a separate code path unaffected by this migration. Without this, a streaming RPC could silently retain the old Boost path after Phase 4 cleanup removes Coro.

See: context | gRPC

xrplf-ai-reviewer · 2026-03-25T15:54:06Z

BoostToStdCoroutineSwitchPlan.md

+(parallel to postCoro)`"]
+        P1D["Unit tests for new primitives"]
+        P1A --> P1B --> P1C --> P1D
+    end


No design notes on CallData lifecycle with gRPC's CompletionQueue. The plan identifies GRPCServer.cpp:102 as an entry point but doesn't verify that CoroTaskRunner lifetime outlives all CompletionQueue callbacks that reference it, or that coroutine frame ownership is safe across tag firings. Add an analysis tracing: CQ tag posted → process() called → coroutine suspended → CQ tag fires again → coroutine resumed, and confirm no raw coroutine_handle<> is stored in CQ tags without RAII ownership.

See: gRPC

xrplf-ai-reviewer · 2026-03-25T15:54:06Z

BoostToStdCoroutineTaskList.md

+  - Replace `m_jobQueue.postCoro(jtCLIENT_RPC, ...)` with `postCoroTask()`
+  - Update lambda to return `CoroTask<void>` (add `co_return`)
+  - Update `processSession` to accept new coroutine type
+


Task 2.3 is missing gRPC shutdown handling. The old Coro::post() returned false when the JobQueue was stopping, letting the CallData handler detect shutdown and call Finish() with an appropriate status. Add a sub-task: Verify that when addJob() returns false during shutdown, the awaiter causes the coroutine to terminate and the gRPC call is finished with grpc::StatusCode::UNAVAILABLE. Write a test that shuts down the JobQueue while a gRPC coroutine is suspended and confirms no RPC hangs indefinitely.

See: gRPC

xrplf-ai-reviewer · 2026-03-25T15:54:06Z

BoostToStdCoroutineSwitchPlan.md

+
+| #   | File                                  | Phase | Purpose                                  |
+| --- | ------------------------------------- | ----- | ---------------------------------------- |
+| 1   | `include/xrpl/core/CoroTask.h`        | 1     | `CoroTask<T>` return type + promise_type |


The FAQ admits blocking a worker thread for up to 30 seconds via std::condition_variable, directly contradicting the migration's goal of freeing threads during suspension and voiding the performance gains claimed in Section 4.4 for this code path. Either implement PathFindAwaiter (task 3.2) to properly suspend the coroutine, or at minimum document this as a known regression and ensure the pathfinding timeout is capped below the gRPC deadline so the thread is guaranteed to be released before the client times out.

See: std::condition_variable | gRPC

a1q123456 · 2026-03-27T10:54:13Z

Given that our coroutine use case isn't different from others at all, I think we can use the existing coroutine implementation in boost.asio instead of reinventing the wheel and implementing our own promise and future types.

This approach gives us some benefits:

Properly tested coroutine implementation - boost.asio is used everywhere and by out sourcing this part to boost, we don't need to scratch our heads looking at coroutine-related and concurrent-related bugs
We get less code and it shortens development time
Crystal-clear design - co_spawn schedules a coroutine on an executor; boost::asio::detached means fire and forget, boost::asio::use_future returns a std::future so that you can wait for it synchronously

To make it work, we'll need to implement an executor that meets asio's requirement, refactor JobQueue and Coro to use the executor. As the second phase, we refactor to use boost.asio coroutine, and then we can replace Workers with boost::thread_pool.

I propose this plan:

Phase 1:

Implement JobQueueExecutor — Custom Asio executor with execute() that calls addJob(), carrying a JobType for priority. Wraps the function to save/restore LocalValues pointer around invocation.
Refactor JobQueue and Coro to use JobQueueExecutor
Unit tests for JobQueueExecutor. The public API of JobQueue shouldn't change at this moment, we only need to ensure the current JobQueue tests pass.

Phase 2: Replace Coro with C++20 coroutines

Remove Coro class and replace postCoro — Delete Coro, Coro.ipp, Coro_create_t, nSuspend_. New method uses co_spawn + JobQueueExecutor
Update RPC::Context — Replace std::shared_ptrJobQueue::Coro coro member with a new mechanism (e.g. the executor, or a yield awaitable)
Update ServerHandler (HTTP RPC) — 3 call sites: onRequest(), processSession(Session), processRequest()
Update ServerHandler (WebSocket) — 1 call site: onWSMessage() + processSession(WSSession)
Update GRPCServer — 1 call site: CallData::onDone() + CallData::process()
Update RipplePathFind — Most complex: yield()/post()/resume() pattern with async callback needs replacing with co_await on a promise/event
Update existing tests — Coroutine_test.cpp, JobQueue_test.cpp
We may want to rename Coro to something like CoroutineHandle or CoroutineFrame or whatever
Unit tests for the new JobQueue and Coro

Phase 3: Replace Workers with boost::asio::thread_pool

Replace Workers internals — Remove Workers class, replace with thread_pool. Remove Workers::Callback inheritance from JobQueue. Drop setNumberOfThreads.
Clean up Workers tests

pratikmankawde added the StdCoroutineSwitch Boost to Std Coroutine Switch label Mar 25, 2026

pratikmankawde changed the title ~~Pratik/swtich to std coroutines~~ docs: Pratik/swtich to std coroutines Mar 25, 2026

pratikmankawde changed the title ~~docs: Pratik/swtich to std coroutines~~ docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan Mar 25, 2026

pratikmankawde marked this pull request as ready for review March 25, 2026 15:02

pratikmankawde requested review from a1q123456, bthomee, kuznetsss and vlntb March 25, 2026 15:03

xrplf-ai-reviewer bot reviewed Mar 25, 2026

View reviewed changes

pratikmankawde force-pushed the pratik/Swtich-to-std-coroutines branch from 255ecc1 to 956c105 Compare March 25, 2026 15:09

xrplf-ai-reviewer bot reviewed Mar 25, 2026

View reviewed changes

pratikmankawde force-pushed the pratik/Swtich-to-std-coroutines branch from 956c105 to b78202a Compare March 25, 2026 15:48

xrplf-ai-reviewer bot reviewed Mar 25, 2026

View reviewed changes

	for ((index = 1; index <= ${#recipes[@]}; index++)); do
	for ((index = 0; index < ${#recipes[@]}; index++)); do

Conversation

pratikmankawde commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

High Level Overview of Change

PR Chain

Context of Change

Known Plan-vs-Implementation Divergences

API Impact

Uh oh!

xrplf-ai-reviewer bot left a comment

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

pratikmankawde Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot left a comment

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot left a comment

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xrplf-ai-reviewer bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

a1q123456 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pratikmankawde commented Mar 25, 2026 •

edited

Loading

a1q123456 commented Mar 27, 2026 •

edited

Loading