Skip to content

Commit e6cae5a

Browse files
bart-lineraclaude
andauthored
Adjust service chain ownership when reassigning services between workers (#5435)
## Motivation If a service is reassign from worker A to worker B by the controller application, worker A should cease to be the owner of the service's chain, and worker B should become an owner. ## Proposal Upon reassignment, when telling worker A to stop running the service, worker A will add worker B as an owner and notify the main controller chain. The main controller will then tell worker B to take over, telling it at which height on the service chain it became an owner. Worker B then waits until it synchronizes the service chain until that height, and when it's ready, it starts listening to the chain in `FullChain` mode and removes worker A as an owner. ## Test Plan `test_controller` was extended to reassign a service from one worker to another, and then check if the second worker properly handles a task request. ## Release Plan - These changes should be backported to the latest `testnet` branch, then - be released in a new SDK, ## Links - #5270 - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f66b510 commit e6cae5a

File tree

8 files changed

+611
-42
lines changed

8 files changed

+611
-42
lines changed

examples/controller/README.md

Lines changed: 61 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -92,26 +92,77 @@ sequenceDiagram
9292
participant W2 as Worker 2
9393
A->>C: ExecuteControllerCommand<br/>{UpdateService}
9494
C->>C: Update services registry
95-
C->>W1: Message::Start{service_id}
96-
C->>W2: Message::Start{service_id}
95+
C->>W1: Start{service_id,<br/>owners_to_remove: {},<br/>start_height: None}
96+
C->>W2: Start{service_id,<br/>owners_to_remove: {},<br/>start_height: None}
9797
W1->>W1: Add to local_services
9898
W2->>W2: Add to local_services
9999
```
100100

101+
### Service Handoff (Reassignment)
102+
103+
When a service is moved from one worker to another, all three chain types
104+
participate in a two-phase handoff protocol. The new worker's owners are added
105+
to the service chain first, then the old worker's owners are removed, ensuring
106+
there is no gap in ownership.
107+
108+
```mermaid
109+
sequenceDiagram
110+
participant A as Admin
111+
participant C as Controller Chain
112+
participant WA as Old Worker Chain
113+
participant S as Service Chain
114+
participant WB as New Worker Chain
115+
116+
A->>C: ExecuteControllerCommand<br/>{UpdateService{service, [WB]}}
117+
C->>C: Record pending handoff
118+
C->>WA: Stop{service, new_owners: [WB]}
119+
120+
note over WA,S: Phase 1: Add new owners
121+
WA->>S: AddOwners{service, [WB]}
122+
S->>S: Add WB as chain owner
123+
S->>WA: OwnersAdded{service, block_height}
124+
125+
WA->>WA: Remove from local_services
126+
WA->>C: HandoffStarted{service, block_height}
127+
128+
C->>C: Resolve pending handoff
129+
C->>WB: Start{service,<br/>owners_to_remove: [WA],<br/>start_height: block_height}
130+
WB->>WB: Add to local_pending_services
131+
132+
note over WB,S: Phase 2: Remove old owners
133+
WB->>WB: StartLocalService at start_height
134+
WB->>S: RemoveOwners{[WA]}
135+
S->>S: Remove WA as chain owner
136+
WB->>WB: Move to local_services
137+
```
138+
101139
### Service Removal
102140

141+
When a service is removed, Stop messages are sent to all workers. Each worker
142+
initiates ownership cleanup through the service chain before removing the
143+
service locally, following the same handoff protocol but with empty new owners.
144+
103145
```mermaid
104146
sequenceDiagram
105147
participant A as Admin
106148
participant C as Controller Chain
107149
participant W1 as Worker 1
108150
participant W2 as Worker 2
151+
participant S as Service Chain
109152
A->>C: ExecuteControllerCommand<br/>{RemoveService}
110153
C->>C: Remove from services registry
111-
C->>W1: Message::Stop{service_id}
112-
C->>W2: Message::Stop{service_id}
154+
C->>W1: Stop{service_id, new_owners: {}}
155+
C->>W2: Stop{service_id, new_owners: {}}
156+
157+
W1->>S: AddOwners{service_id, {}}
158+
S->>W1: OwnersAdded{service_id, block_height}
113159
W1->>W1: Remove from local_services
160+
W1->>C: HandoffStarted{service_id, block_height}
161+
162+
W2->>S: AddOwners{service_id, {}}
163+
S->>W2: OwnersAdded{service_id, block_height}
114164
W2->>W2: Remove from local_services
165+
W2->>C: HandoffStarted{service_id, block_height}
115166
```
116167

117168
## Operations
@@ -156,7 +207,11 @@ Messages are sent between chains to coordinate state:
156207
| `ExecuteWorkerCommand` | Worker -> Controller | Register/deregister worker |
157208
| `ExecuteControllerCommand` | Any -> Controller | Admin commands |
158209
| `Reset` | Controller -> Worker | Clear worker state |
159-
| `Start { service_id }` | Controller -> Worker | Start a service |
160-
| `Stop { service_id }` | Controller -> Worker | Stop a service |
210+
| `Start { service_id, owners_to_remove, start_height }` | Controller -> Worker | Start a service, optionally with handoff info |
211+
| `Stop { service_id, new_owners }` | Controller -> Worker | Stop a service, initiating ownership handoff via the service chain |
161212
| `FollowChain { chain_id }` | Controller -> Worker | Follow a chain |
162213
| `ForgetChain { chain_id }` | Controller -> Worker | Stop following a chain |
214+
| `AddOwners { service_id, new_owners }` | Worker -> Service Chain | Add new owners to a service chain during handoff |
215+
| `RemoveOwners { owners_to_remove }` | Worker -> Service Chain | Remove old owners from a service chain after handoff |
216+
| `OwnersAdded { service_id, added_at }` | Service Chain -> Worker | Confirm new owners were added at a given block height |
217+
| `HandoffStarted { service_id, target_block_height }` | Worker -> Controller | Notify controller that handoff phase 1 is complete |

0 commit comments

Comments
 (0)