Skip to content

Commit 62aae5a

Browse files
committed
server modification to collect metrics and updated docs
1 parent 796cbb1 commit 62aae5a

File tree

2 files changed

+261
-53
lines changed

2 files changed

+261
-53
lines changed

documentation/backend_documentation/runtime_and_resources.md

Lines changed: 118 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -176,19 +176,42 @@ An `EdgeRuntime` is a direct analog for a **physical or virtual network link**.
176176

177177
-----
178178

179-
### **5.3 `ServerRuntime`The Workhorse 📦**
179+
### **5.3  `ServerRuntime` — The Workhorse 📦 (2025 edition)**
180180

181-
`ServerRuntime` models an application server that owns finite CPU/RAM resources and executes a chain of steps for every incoming request.
182-
With the 2025 refactor it now uses a **dispatcher / handler** pattern: the dispatcher sits in an infinite loop, and each request is handled in its own SimPy subprocess. This enables many concurrent in-flight requests while keeping the code easy to reason about.
181+
`ServerRuntime` emulates an application server that owns **finite CPU / RAM containers** and executes an ordered chain of **Step** objects for every incoming request.
182+
The 2025 refactor keeps the classic **dispatcher / handler** pattern, adds **live metric counters** (ready‑queue length, I/O‑queue length, RAM‑in‑use) and implements the **lazy‑CPU lock** algorithm described earlier.
183183

184-
| `__init__` parameter | Meaning |
185-
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
186-
| **`env`** | The shared `simpy.Environment`. Every timeout and resource operation is scheduled here. |
187-
| **`server_resources`** | A `ServerContainers` mapping (`{"CPU": Container, "RAM": Container}`) created by `ResourcesRuntime`. The containers are **pre-filled** (`level == capacity`) so the server can immediately pull tokens. |
188-
| **`server_config`** | The validated Pydantic `Server` model: server-wide ID, resource spec, and a list of `Endpoint` objects (each endpoint is an ordered list of `Step`s). |
189-
| **`out_edge`** | The `EdgeRuntime` (or stub) that receives the `RequestState` once processing finishes. |
190-
| **`server_box`** | A `simpy.Store` acting as the server’s inbox. Up-stream actors drop `RequestState`s here. |
191-
| **`rng`** | Instance of `numpy.random.Generator`; defaults to `default_rng()`. Used to pick a random endpoint. |
184+
| `__init__` parameter | Meaning |
185+
| ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
186+
| **`env`** | Shared `simpy.Environment`. Every timeout or resource operation is scheduled here. |
187+
| **`server_resources`** | A `ServerContainers` mapping `{"CPU": Container, "RAM": Container}` created by `ResourcesRuntime`. Containers start **full** so a server can immediately pull tokens. |
188+
| **`server_config`** | Validated Pydantic `Server` model: ID, resource spec, list `endpoints: list[Endpoint]`. |
189+
| **`out_edge`** | `EdgeRuntime` (or stub) that receives the `RequestState` once processing finishes. |
190+
| **`server_box`** | `simpy.Store` acting as the server’s inbox. Up‑stream actors drop `RequestState`s here. |
191+
| **`rng`** | `numpy.random.Generator`; defaults to `default_rng()`. Used to pick a random endpoint. |
192+
193+
---
194+
195+
#### **Live metric fields**
196+
197+
| Field | Unit | Updated when… | Used for… |
198+
| --------------------- | -------- | ------------------------------------------------ | ------------------------------------------------------------------------ |
199+
| `_el_ready_queue_len` | requests | a CPU step acquires / releases a core | **Ready queue length** (how many coroutines wait for the GIL / a worker) |
200+
| `_el_io_queue_len` | requests | an I/O step enters / leaves the socket wait list | **I/O queue length** (awaits in progress) |
201+
| `_ram_in_use` | MB | RAM `get` / `put` | Instant **RAM usage** per server |
202+
203+
Accessor properties expose them read‑only:
204+
205+
```python
206+
@property
207+
def ready_queue_len(self) -> int: return self._el_ready_queue_len
208+
209+
@property
210+
def io_queue_len(self) -> int: return self._el_io_queue_len
211+
212+
@property
213+
def ram_in_use(self) -> int: return self._ram_in_use
214+
```
192215

193216
---
194217

@@ -211,56 +234,111 @@ Registers the **dispatcher** coroutine in the environment and returns the create
211234
212235
213236
RAM get → CPU/IO steps → RAM put → out_edge.transport()
237+
▲ │
238+
│ └── metric counters updated here
239+
└── lazy CPU lock (get once, put on first I/O)
214240
```
215241

216242
1. **Dispatcher loop**
217243

218-
```python
219-
while True:
220-
raw_state = yield self.server_box.get() # blocks until a request arrives
221-
state = cast(RequestState, raw_state)
222-
self.env.process(self._handle_request(state)) # fire-and-forget
223-
```
224-
225-
*Spawning a new process per request mimics worker thread concurrency.*
244+
```python
245+
while True:
246+
raw_state = yield self.server_box.get() # blocks until a request arrives
247+
state = cast(RequestState, raw_state)
248+
self.env.process(self._handle_request(state)) # fire‑and‑forget
249+
```
226250

227251
2. **Handler coroutine (`_handle_request`)**
228252

229-
| Stage | Implementation detail |
230-
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
231-
| **Record arrival** | `state.record_hop(SystemNodes.SERVER, self.server_config.id, env.now)` – leaves a breadcrumb for tracing. |
232-
| **Endpoint selection** | Uniform random index `rng.integers(0, len(endpoints))`. (Hook point for custom routing later.) |
233-
| **Reserve RAM (back-pressure)** | Compute `total_ram` (sum of all `StepOperation.NECESSARY_RAM`). `yield RAM.get(total_ram)`. If not enough RAM is free, the coroutine blocks, creating natural memory pressure. |
234-
| **Execute steps in order** | |
235-
| – CPU-bound step | `yield CPU.get(1)``yield env.timeout(cpu_time)``yield CPU.put(1)` – exactly one core is busy for the duration. |
236-
| – I/O-bound step | `yield env.timeout(io_wait)` – no core is held, modelling non-blocking I/O. |
237-
| **Release RAM** | `yield RAM.put(total_ram)`. |
238-
| **Forward** | `out_edge.transport(state)` – hands the request to the next hop without waiting for network latency. |
253+
| Stage | Implementation detail |
254+
| ------------------------------- | ----------------------------------------------------------------------------------------- |
255+
| **Record arrival** | `state.record_hop(SystemNodes.SERVER, self.server_config.id, env.now)` |
256+
| **Endpoint selection** | Uniform random index `rng.integers(0, len(endpoints))` (plug‑in point for custom routing) |
257+
| **Reserve RAM (back‑pressure)** | compute `total_ram``yield RAM.get(total_ram)``_ram_in_use += total_ram` |
258+
| **Execute steps** | handled in a loop with *lazy CPU lock* and metric updates (see edge‑case notes below) |
259+
| **Release RAM** | `_ram_in_use -= total_ram``yield RAM.put(total_ram)` |
260+
| **Forward** | `out_edge.transport(state)` — send to next hop without awaiting latency |
261+
262+
---
263+
264+
#### **CPU / I‑O loop details**
265+
266+
* **Lazy‑CPU lock** – first CPU step acquires one core; all following contiguous CPU steps reuse it.
267+
* **Release on I/O** – on the first I/O step the core is released; it remains free until the next CPU step.
268+
* **Metric updates** – counters are modified only on the **state transition** (CPU→I/O, I/O→CPU) so there is never double‑counting.
269+
270+
```python
271+
if isinstance(step.kind, EndpointStepCPU):
272+
if not core_locked:
273+
yield CPU.get(1)
274+
core_locked = True
275+
self._el_ready_queue_len += 1 # entered ready queue
276+
if is_in_io_queue:
277+
self._el_io_queue_len -= 1
278+
is_in_io_queue = False
279+
yield env.timeout(cpu_time)
280+
281+
elif isinstance(step.kind, EndpointStepIO):
282+
if core_locked:
283+
yield CPU.put(1)
284+
core_locked = False
285+
self._el_ready_queue_len -= 1
286+
if not is_in_io_queue:
287+
self._el_io_queue_len += 1
288+
is_in_io_queue = True
289+
yield env.timeout(io_time)
290+
```
291+
292+
**Handler epilogue**
293+
294+
```python
295+
# at exit, remove ourselves from whichever queue we are in
296+
if core_locked: # we are still in ready queue
297+
self._el_ready_queue_len -= 1
298+
yield CPU.put(1)
299+
elif is_in_io_queue: # finished while awaiting I/O
300+
self._el_io_queue_len -= 1
301+
```
302+
303+
> This guarantees both queues always balance back to 0 after the last request completes.
239304
240305
---
241306

242307
#### **Concurrency Guarantees**
243308

244-
* **CPU contention**because CPU is a token bucket (`simpy.Container`) the maximum number of concurrent CPU-bound steps equals `cpu_cores`.
245-
* **RAM contention**large requests can stall entirely until enough RAM frees up, accurately modelling out-of-memory throttling.
246-
* **Non-blocking I/O** – while a handler waits on an I/O step it releases the core, allowing other handlers to run; this mirrors an async framework where the event loop can service other sockets.
309+
* **CPU contention**the `CPU` container is a token bucket; max concurrent CPUbound steps = `cpu_cores`.
310+
* **RAM contention** – requests block at `RAM.get()` until memory is free (models cgroup / OOM throttling).
311+
* **Nonblocking I/O** – while in `env.timeout(io_wait)` no core token is held, so other handlers can run; mirrors an async server where workers return to the eventloop on each `await`.
247312

248313
---
249314

250-
#### **Real-World Analogy**
315+
#### **Edge‑case handling (metrics)**
251316

252-
| Runtime concept | Real server analogue |
253-
| ------------------------------------- | ------------------------------------------------------------------------------------------ |
254-
| `server_box` | A web server’s accept queue. |
255-
| `CPU.get(1)` | Obtaining one worker thread/process in Gunicorn, uWSGI, or a Node.js “JS thread”. |
256-
| `env.timeout(io_wait)` without a core | An `await` on a database or HTTP call; the worker is idle while the OS handles the socket. |
257-
| RAM token bucket | Process resident set or container memory limit; requests block when heap is exhausted. |
317+
* **First‑step I/O** – counted only in I/O queue (`+1`), never touches ready queue.
318+
* **Consecutive I/O steps** – second I/O sees `is_in_io_queue == True`, so no extra increment (no double count).
319+
* **CPU → I/O → CPU**
320+
 – CPU step: `core_locked=True`, `+1` ready queue
321+
 – I/O step: core released, `‑1` ready queue, `+1` I/O queue
322+
 – next CPU: core reacquired, `‑1` I/O queue, `+1` ready queue
323+
* **Endpoint finishes** – epilogue removes the request from whichever queue it still occupies, avoiding “ghost” entries.
324+
325+
---
258326

259-
Thus a **CPU-bound step** is a tight Python loop holding the GIL, while an **I/O-bound step** is `await cursor.execute(...)` that frees the event loop.
327+
#### **Real‑World Analogy**
328+
329+
| Runtime concept | Real server analogue |
330+
| --------------------------------------- | --------------------------------------------------------------------------------------- |
331+
| `server_box` | Web server accept queue (e.g., `accept()` backlog). |
332+
| `CPU.get(1)` / `CPU.put(1)` | Claiming / releasing a worker thread or GIL slot (Gunicorn, uWSGI, Node.js event‑loop). |
333+
| `env.timeout(io_wait)` (without a core) | `await redis.get()` – coroutine parked while the kernel handles the socket. |
334+
| RAM token bucket | cgroup memory limit / container hard‑RSS; requests block when heap is exhausted. |
335+
336+
Thus a **CPU‑bound step** models tight Python code holding the GIL, while an **I/O‑bound step** models an `await` that yields control back to the event loop, freeing the core.
260337

261338
---
262339

263340

341+
264342
### **5.4. ClientRuntime: The Destination**
265343

266344
This actor typically represents the end-user or system that initiated the request, serving as the final destination.

0 commit comments

Comments
 (0)