Skip to content

Commit 84ebccf

Browse files
more fixes to doc
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
1 parent c126f7e commit 84ebccf

File tree

1 file changed

+56
-53
lines changed

1 file changed

+56
-53
lines changed

BoostToStdCoroutineSwitchPlan.md

Lines changed: 56 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,6 @@ This document describes the plan for migrating rippled's coroutine implementatio
3434

3535
Coroutines in rippled are used to handle long-running RPC requests — such as pathfinding — without blocking server threads. When a request needs to wait for an external event, the coroutine **suspends** (freeing the thread for other work) and **resumes** later when the event completes.
3636

37-
**In simple terms:** Think of a restaurant kitchen. The current system (Boost.Coroutine) is like giving each order its own dedicated chef who stands idle while waiting for the oven — expensive in staff. The new system (C++20 coroutines) is like giving each order a ticket: the chef puts the ticket aside when waiting for the oven, picks up another order, and returns to the first ticket when the oven beeps. Same kitchen, far fewer idle chefs.
38-
3937
---
4038

4139
## 2. Why Is This Needed?
@@ -145,14 +143,14 @@ graph LR
145143

146144
### 4.3 Performance Characteristics
147145

148-
| Metric | Boost.Coroutine2 | C++20 Coroutines |
149-
| ------------------------------ | -------------------------------------------------------------------- | ------------------------------------ |
150-
| **Memory per coroutine** | ~1.5 MB (fixed stack) | ~200-500 bytes (frame only) |
151-
| **1000 concurrent coroutines** | ~1.5 GB | ~0.5 MB |
152-
| **Context switch cost** | ~19 CPU cycles (fcontext save/restore, per Boost.Context benchmarks) | ~20-50 CPU cycles (function call) |
153-
| **Allocation** | Stack allocated at creation | Heap allocation (compiler may elide) |
154-
| **Cache behavior** | Poor (large stack rarely fully used) | Good (small frame, hot data close) |
155-
| **Compiler optimization** | Opaque to compiler | Inlinable, optimizable |
146+
| Metric | Boost.Coroutine2 | C++20 Coroutines |
147+
| ------------------------------ | ---------------------------------------------------------------------------------------- | ------------------------------------ |
148+
| **Memory per coroutine** | ~1.5 MB (fixed stack) | ~200-500 bytes (frame only) |
149+
| **1000 concurrent coroutines** | ~1.5 GB | ~0.5 MB |
150+
| **Context switch cost** | ~19 cycles / 9 ns with fcontext; ~1,130 cycles / 547 ns with ucontext (ASAN/TSAN builds) | ~20-50 CPU cycles (function call) |
151+
| **Allocation** | Stack allocated at creation | Heap allocation (compiler may elide) |
152+
| **Cache behavior** | Poor (large stack rarely fully used) | Good (small frame, hot data close) |
153+
| **Compiler optimization** | Opaque to compiler | Inlinable, optimizable |
156154

157155
### 4.4 Feature Parity Analysis
158156

@@ -233,9 +231,17 @@ However, these types are small, well-understood, and have extensive reference im
233231

234232
**Claim**: If coroutine A `co_await`s coroutine B, and B completes synchronously, B's `final_suspend` resumes A on the same stack, potentially building up unbounded stack depth.
235233

236-
**Analysis**: This is addressed by **symmetric transfer** via `FinalAwaiter::await_suspend()` returning a `coroutine_handle<>` instead of `void`. The compiler transforms this into a tail-call, preventing stack growth. This is the standard solution used by all major coroutine libraries and is implemented in our `FinalAwaiter` design (Section 7.1).
234+
**Why this is a real problem without symmetric transfer**: When `await_suspend()` returns `void`, the coroutine unconditionally suspends and returns from `.resume()`. If the awaited coroutine completes synchronously and calls `.resume()` on the awaiter, each such call adds a stack frame. In a loop that repeatedly `co_await`s short-lived coroutines (e.g., a generator producing millions of values), the stack grows with each iteration until it overflows — typically after ~1M iterations.
235+
236+
**How symmetric transfer solves it**: When `await_suspend()` returns a `coroutine_handle<>` instead of `void`, the compiler destroys the current coroutine's stack frame _before_ jumping to the returned handle. This is effectively a tail-call: `resume()` becomes a `jmp` instead of a `call`, so each chained resumption consumes **zero additional stack space**.
237+
238+
The C++ standard (P0913R0) mandates this by requiring: _"Implementations shall not impose any limits on how many coroutines can be resumed in this fashion."_ This effectively requires compilers to implement tail-call-like behavior — any finite stack would impose a limit otherwise.
237239

238-
**Verdict**: Solved by symmetric transfer (already in our design).
240+
**Returning `std::noop_coroutine()`** from `await_suspend()` signals "suspend and return to caller" without resuming another coroutine, serving the role that `void` return used to play.
241+
242+
**Applicability to rippled**: rippled does not chain coroutines (coroutine A awaiting coroutine B). The `co_await` points in rippled await `JobQueueAwaiter` (reschedules on the thread pool) and `yieldAndPost()` (suspend + re-post), both of which always suspend asynchronously. However, symmetric transfer is still implemented in our `FinalAwaiter` ([Section 7.1](#71-new-type-design)) as a best practice — it costs nothing and prevents stack overflow if the usage pattern ever changes.
243+
244+
**Verdict**: Real concern for coroutine chains, but does not affect rippled's current usage. Solved by symmetric transfer in our design regardless.
239245

240246
#### **Concern 5: Dangling Reference Risk**
241247

@@ -271,7 +277,7 @@ However, these types are small, well-understood, and have extensive reference im
271277

272278
**Verdict**: Separate system. Out of scope for this migration.
273279

274-
**Consequence — `Boost::context` dependency is retained**: Because `boost::asio::spawn` depends on `Boost.Context` for its stackful fiber implementation, the `Boost::context` library **cannot be removed** as part of this migration. The CMake cleanup (Phase 4) replaces `Boost::coroutine` with `Boost::context` — it does not eliminate the Boost fiber dependency entirely.
280+
**Consequence — `Boost::context` dependency is retained**: Because `boost::asio::spawn` depends on `Boost.Context` for its stackful fiber implementation, the `Boost::context` library **cannot be removed** as part of this migration. The CMake cleanup ([Phase 4](#phase-4-cleanup)) replaces `Boost::coroutine` with `Boost::context` — it does not eliminate the Boost fiber dependency entirely.
275281

276282
Additionally, when running under ASAN or TSAN, `Boost.Context` must be built with the `ucontext` backend (not the default `fcontext`) so that it emits `__sanitizer_start_switch_fiber` / `__sanitizer_finish_switch_fiber` annotations during fiber context switches. Without these annotations, the sanitizers cannot track memory ownership across fiber stack switches and will report false positives (stack-use-after-scope under ASAN, data races under TSAN) for the `boost::asio::spawn` call sites listed above. This requires:
277283

@@ -342,7 +348,7 @@ graph TD
342348
end
343349
344350
subgraph "Coroutine Layer"
345-
POST["JobQueue::postCoro()<br/>Creates Coro + schedules job"]
351+
POST["JobQueue::postCoro()<br/>Creates Coro<br/>+ schedules job"]
346352
CORO["JobQueue::Coro<br/>boost::coroutines2::coroutine<void>::pull_type<br/>1.5 MB stack per instance"]
347353
end
348354
@@ -563,8 +569,6 @@ rippled **already uses C++20 coroutines** in test code:
563569

564570
## 6. Migration Strategy
565571

566-
> **RPC** = Remote Procedure Call | **API** = Application Programming Interface
567-
568572
### 6.1 Incremental vs Atomic Migration
569573

570574
**Decision: Incremental (multi-phase) migration.**
@@ -580,15 +584,15 @@ Rationale:
580584

581585
```mermaid
582586
graph TD
583-
subgraph "Phase 1: Foundation"
587+
subgraph PH1 ["Phase 1: Foundation"]
584588
P1A["Create CoroTask#lt;T#gt; type<br/>(promise_type, awaiter)"]
585589
P1B["Create JobQueueAwaiter<br/>(schedules resume on JobQueue)"]
586590
P1C["Add postCoroTask() to JobQueue<br/>(parallel to postCoro)"]
587591
P1D["Unit tests for new primitives"]
588592
P1A --> P1B --> P1C --> P1D
589593
end
590594
591-
subgraph "Phase 2: Entry Point Migration"
595+
subgraph PH2 ["Phase 2: Entry Point Migration"]
592596
P2A["Migrate ServerHandler::onRequest()"]
593597
P2B["Migrate ServerHandler::onWSMessage()"]
594598
P2C["Migrate GRPCServer::CallData::process()"]
@@ -598,32 +602,31 @@ graph TD
598602
P2C --> P2D
599603
end
600604
601-
subgraph "Phase 3: Handler Migration"
605+
subgraph PH3 ["Phase 3: Handler Migration"]
602606
P3A["Migrate RipplePathFind handler"]
603607
P3B["Verify all other handlers<br/>(no active yield usage)"]
604608
end
605609
606-
subgraph "Phase 4: Cleanup"
610+
subgraph PH4 ["Phase 4: Cleanup"]
607611
P4A["Remove old Coro class"]
608612
P4B["Remove Boost.Coroutine from CMake"]
609613
P4C["Remove deprecation warning suppression"]
610614
P4D["Final benchmarks & validation"]
615+
P4A --> P4B --> P4C --> P4D
611616
end
612617
613-
P1D --> P2A
614-
P2D --> P3A
615-
P3B --> P4A
616-
P3A --> P4A
617-
P4A --> P4B --> P4C --> P4D
618+
PH1 --> PH2
619+
PH2 --> PH3
620+
PH3 --> PH4
618621
```
619622

620623
**Reading the diagram:**
621624

622-
- **Phase 1** builds the new coroutine primitives (`CoroTask`, `JobQueueAwaiter`, `postCoroTask()`) alongside the existing Boost code. No production code changes.
623-
- **Phase 2** migrates the three entry points (HTTP, WebSocket, gRPC) to use `postCoroTask()` and updates `RPC::Context`.
624-
- **Phase 3** migrates the `RipplePathFind` handler and verifies no other handlers use `yield()`.
625-
- **Phase 4** removes the old `Coro` class, `Coro.ipp`, `Boost::coroutine` from CMake, and runs final benchmarks.
626-
- Each phase depends on the previous one completing. The old code is not deleted until Phase 4, so rollback is safe through Phases 1–3.
625+
- **[Phase 1](#phase-1-new-coroutine-primitives)** builds the new coroutine primitives (`CoroTask`, `JobQueueAwaiter`, `postCoroTask()`) alongside the existing Boost code. No production code changes.
626+
- **[Phase 2](#phase-2-entry-point-migration)** migrates the three entry points (HTTP, WebSocket, gRPC) to use `postCoroTask()` and updates `RPC::Context`.
627+
- **[Phase 3](#phase-3-handler-migration)** migrates the `RipplePathFind` handler and verifies no other handlers use `yield()`.
628+
- **[Phase 4](#phase-4-cleanup)** removes the old `Coro` class, `Coro.ipp`, `Boost::coroutine` from CMake, and runs final benchmarks.
629+
- Each phase depends on the previous one completing. The old code is not deleted until [Phase 4](#phase-4-cleanup), so rollback is safe through Phases 1–3.
627630

628631
### 6.3 Coexistence Strategy
629632

@@ -651,12 +654,12 @@ graph LR
651654

652655
### 6.4 Breaking Changes & Compatibility
653656

654-
| Concern | Impact | Mitigation |
655-
| -------------------------------- | ----------------------------------------- | -------------------------------------------------------- |
656-
| `RPC::Context::coro` type change | All RPC handlers receive context | Migrate context field last, after all consumers updated |
657-
| `postCoro()` removal | 3 callers | Replace with `postCoroTask()`, remove old API in Phase 4 |
658-
| `LocalValue` integration | Thread-local storage must work | New implementation must swap LocalValues identically |
659-
| Shutdown behavior | `expectEarlyExit()`, `nSuspend_` tracking | Replicate in new CoroTask |
657+
| Concern | Impact | Mitigation |
658+
| -------------------------------- | ----------------------------------------- | ---------------------------------------------------------------------------- |
659+
| `RPC::Context::coro` type change | All RPC handlers receive context | Migrate context field last, after all consumers updated |
660+
| `postCoro()` removal | 3 callers | Replace with `postCoroTask()`, remove old API in [Phase 4](#phase-4-cleanup) |
661+
| `LocalValue` integration | Thread-local storage must work | New implementation must swap LocalValues identically |
662+
| Shutdown behavior | `expectEarlyExit()`, `nSuspend_` tracking | Replicate in new CoroTask |
660663

661664
---
662665

@@ -1060,21 +1063,21 @@ quadrantChart
10601063
Third-party Boost dep: [0.10, 0.30]
10611064
```
10621065

1063-
| Risk | Probability | Impact | Mitigation |
1064-
| ----------------------------------------------------- | ----------- | ------ | --------------------------------------------------------------------------- |
1065-
| **Performance regression** in context switching | Low | High | Benchmark before/after; C++20 should be faster |
1066-
| **Coroutine frame lifetime bugs** (use-after-destroy) | Medium | High | ASAN testing, RAII wrapper for handle, code review |
1067-
| **Data races on resume** | Medium | High | TSAN testing, careful await_suspend() implementation |
1068-
| **LocalValue corruption** across threads | Low | High | Dedicated test with 4+ concurrent coroutines |
1069-
| **Shutdown race conditions** | Medium | Medium | Replicate existing mutex/cv pattern in new design |
1070-
| **Missed coroutine consumer** during migration | Low | Medium | Exhaustive grep audit (Section 5.4 is complete) |
1071-
| **Compiler bugs** in coroutine codegen | Low | Medium | Test on all three compilers (GCC, Clang, MSVC) |
1072-
| **Exception loss** across suspension points | Medium | Medium | Test exception propagation in every phase |
1073-
| **Third-party code depending on Boost.Coroutine** | Very Low | Low | Grep confirms only internal usage |
1074-
| **Dangling references in coroutine frames** | Medium | High | ASAN testing, avoid reference params in coroutine functions, use shared_ptr |
1075-
| **Colored function infection spreading** | Low | Medium | Only 4 call sites need co_await; no nested handlers suspend |
1076-
| **Symmetric transfer not available** | Very Low | High | All target compilers (GCC 12+, Clang 16+) support symmetric transfer |
1077-
| **Future handler adding deep yield** | Low | Medium | Code review + CI: static analysis flag any yield from nested depth |
1066+
| Risk | Probability | Impact | Mitigation |
1067+
| ----------------------------------------------------- | ----------- | ------ | -------------------------------------------------------------------------------- |
1068+
| **Performance regression** in context switching | Low | High | Benchmark before/after; C++20 should be faster |
1069+
| **Coroutine frame lifetime bugs** (use-after-destroy) | Medium | High | ASAN testing, RAII wrapper for handle, code review |
1070+
| **Data races on resume** | Medium | High | TSAN testing, careful await_suspend() implementation |
1071+
| **LocalValue corruption** across threads | Low | High | Dedicated test with 4+ concurrent coroutines |
1072+
| **Shutdown race conditions** | Medium | Medium | Replicate existing mutex/cv pattern in new design |
1073+
| **Missed coroutine consumer** during migration | Low | Medium | Exhaustive grep audit ([Section 5.4](#54-all-coroutine-touchpoints) is complete) |
1074+
| **Compiler bugs** in coroutine codegen | Low | Medium | Test on all three compilers (GCC, Clang, MSVC) |
1075+
| **Exception loss** across suspension points | Medium | Medium | Test exception propagation in every phase |
1076+
| **Third-party code depending on Boost.Coroutine** | Very Low | Low | Grep confirms only internal usage |
1077+
| **Dangling references in coroutine frames** | Medium | High | ASAN testing, avoid reference params in coroutine functions, use shared_ptr |
1078+
| **Colored function infection spreading** | Low | Medium | Only 4 call sites need co_await; no nested handlers suspend |
1079+
| **Symmetric transfer not available** | Very Low | High | All target compilers (GCC 12+, Clang 16+) support symmetric transfer |
1080+
| **Future handler adding deep yield** | Low | Medium | Code review + CI: static analysis flag any yield from nested depth |
10781081

10791082
### 9.2 Rollback Strategy
10801083

@@ -1101,7 +1104,7 @@ graph TD
11011104
P4 --> PREVENT
11021105
```
11031106

1104-
**Key principle**: Old `Coro` class and `postCoro()` remain in the codebase through Phases 1-3. They are only removed in Phase 4, after all migration is validated. Each phase is independently revertible via `git revert`.
1107+
**Key principle**: Old `Coro` class and `postCoro()` remain in the codebase through Phases 1-3. They are only removed in [Phase 4](#phase-4-cleanup), after all migration is validated. Each phase is independently revertible via `git revert`.
11051108

11061109
### 9.3 Specific Risk: Stackful → Stackless Limitation
11071110

@@ -1496,7 +1499,7 @@ ASAN annotations (`__sanitizer_start_switch_fiber` / `__sanitizer_finish_switch_
14961499
The migration only removes `Boost::coroutine`. rippled's production server code (`BaseHTTPPeer.h`) and test infrastructure (`yield_to.h`) still use `boost::asio::spawn`, which depends on `Boost.Context` for stackful fiber execution. Migrating those call sites to `boost::asio::co_spawn` / `boost::asio::awaitable` is a separate initiative. See [Concern 6](#concern-6-yield_toh--boostasiospawn) for details.
14971500

14981501
**Can C++20 stackless coroutines yield from deeply nested function calls?**
1499-
No — `co_await` can only appear in the immediate coroutine function body. However, an exhaustive audit confirmed that all `yield()` calls in rippled are at the top level of their lambda or handler function. No deep nesting exists. See Section 4.6, Concern 1.
1502+
No — `co_await` can only appear in the immediate coroutine function body. However, an exhaustive audit confirmed that all `yield()` calls in rippled are at the top level of their lambda or handler function. No deep nesting exists. See [Section 4.6, Concern 1](#concern-1-cannot-suspend-from-nested-call-stacks).
15001503

15011504
**Why was `RipplePathFind` not migrated to use `co_await` as the plan originally proposed?**
15021505
During implementation, it was simpler and more robust to replace the coroutine-based yield/post pattern with a `std::condition_variable` synchronous wait. Since `RipplePathFind` is the only handler that suspends, and it already runs on a JobQueue worker thread, blocking that thread for up to 30 seconds is acceptable. This eliminates coroutine complexity from the handler entirely.

0 commit comments

Comments
 (0)