Skip to content

Commit 586f40a

Browse files
committed
updated docs (contents, formatting), added spec file
1 parent a543ee6 commit 586f40a

22 files changed

+1580
-1329
lines changed

.github/workflows/docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
restore-keys: mkdocs-
4545

4646
- name: Install MkDocs
47-
run: pip install mkdocs-material mkdocs-mermaid2-plugin
47+
run: pip install mkdocs-material mkdocs-mermaid2-plugin mkdocs-swagger-ui-tag
4848

4949
- name: Build documentation
5050
run: mkdocs build --strict

docs/architecture/kafka-topic-architecture.md

Lines changed: 96 additions & 54 deletions
Large diffs are not rendered by default.

docs/architecture/lifecycle.md

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,41 @@
1-
Lifecycle of Long‑Running Services
1+
# Service lifecycle
22

3-
We used to manage service lifecycles (start/stop) in a very ad‑hoc way. Some services started themselves, others were started by DI providers, some runners remembered to stop things, some didnt. A few places tried to clean up with locals() checks and besteffort branching. It worked until it didnt: shutdowns were inconsistent, readiness was unclear, and tricky bugs hid in error paths.
3+
Service lifecycles (start/stop) used to be managed in an ad-hoc way. Some services started themselves, others were started by DI providers, some runners remembered to stop things, some didn't. A few places tried to clean up with `locals()` checks and best-effort branching. It worked until it didn't: shutdowns were inconsistent, readiness was unclear, and tricky bugs hid in error paths.
44

5-
We briefly considered a destructorstyle approach (start in __init__, stop in __del__). That idea looks simple on paper, but it’s the wrong fit for asyncio. You cant await from a destructor, and destructors may run after the event loop is already gone. That prevents clean cancellation and flush of background tasks and network clients. It also swallows errors. In short, it’s unreliable.
5+
A destructor-style approach (start in `__init__`, stop in `__del__`) looks simple on paper but is the wrong fit for asyncio. You can't await from a destructor, and destructors may run after the event loop is already gone. That prevents clean cancellation and flush of background tasks and network clients.
66

7-
The pattern that actually fits Python and asyncio is the languages own RAII: async context managers. Each service implements start/stop, and also implements __aenter__/__aexit__ that call start/stop. Runners and the FastAPI lifespan manage the lifetime of multiple services with an AsyncExitStack so the code stays flat and readable. Startup is deterministic, and shutdown always happens while the loop is still alive.
7+
The pattern that actually fits Python and asyncio is the language's own RAII: async context managers. Each service implements `start`/`stop`, and also implements `__aenter__`/`__aexit__` that call them. Runners and the FastAPI lifespan manage the lifetime of multiple services with an `AsyncExitStack` so the code stays flat and readable. Startup is deterministic, and shutdown always happens while the loop is still alive.
88

9-
What changed
9+
## What changed
1010

11-
Services with longrunning background work now implement the async context manager protocol. Coordinator, KubernetesWorker, PodMonitor, SSE Kafka→Redis bridge, EventStoreConsumer, ResultProcessor, DLQManager, EventBus, and the Kafka producer all expose __aenter__/__aexit__ that call start/stop.
11+
Services with long-running background work now implement the async context manager protocol. Coordinator, KubernetesWorker, PodMonitor, SSE Kafka→Redis bridge, EventStoreConsumer, ResultProcessor, DLQManager, EventBus, and the Kafka producer all expose `__aenter__`/`__aexit__` that call `start`/`stop`.
1212

13-
DI providers return unstarted instances for these services. The FastAPI lifespan acquires them and uses an AsyncExitStack to start/stop them in a single place. That removed scattered start/stop logic from providers and made shutdown order explicit.
13+
DI providers return unstarted instances for these services. The FastAPI lifespan acquires them and uses an `AsyncExitStack` to start/stop them in a single place. That removed scattered start/stop logic from providers and made shutdown order explicit.
1414

15-
Worker entrypoints (coordinator, k8sworker, podmonitor, eventreplay, resultprocessor, dlqprocessor) use AsyncExitStack as well. No more if 'x' in locals() cleanups or nested with statements. Each runner acquires the services it needs, enters them in the stack, and blocks. When its time to exit, everything stops in reverse order.
15+
Worker entrypoints (coordinator, k8s-worker, pod-monitor, event-replay, result-processor, dlq-processor) use `AsyncExitStack` as well. No more `if 'x' in locals()` cleanups or nested with statements. Each runner acquires the services it needs, enters them in the stack, and blocks. When it's time to exit, everything stops in reverse order.
1616

17-
Why this is better
17+
## Why this is better
1818

19-
Its deterministic: stop() runs while the loop is alive, in the right order. Its explicit without being noisy: lifecycle sits in one place (lifespan or runner) instead of being sprinkled everywhere. It avoids Python destructors and other hidden magic. It also makes tests a lot less flaky: you can spin up a service in an async with block and know it always gets torn down.
19+
It's deterministic: `stop()` runs while the loop is alive, in the right order. It's explicit without being noisy: lifecycle sits in one place (lifespan or runner) instead of being sprinkled everywhere. It avoids Python destructors and other hidden magic. It also makes tests less flaky: you can spin up a service in an `async with` block and know it always gets torn down.
2020

21-
How to build new services
21+
## Building new services
2222

23-
Keep it simple: implement async start() and stop(). If your service owns a background task, start it in start() and cancel/await it in stop(). Add __aenter__/__aexit__ that await start/stop. Dont start in __init__, and dont rely on __del__. Callers will manage lifetime with an async with or an AsyncExitStack.
23+
Keep it simple: implement async `start()` and `stop()`. If your service owns a background task, start it in `start()` and cancel/await it in `stop()`. Add `__aenter__`/`__aexit__` that await start/stop. Don't start in `__init__`, and don't rely on `__del__`. Callers will manage lifetime with an `async with` or an `AsyncExitStack`.
2424

25-
How to use many services together
25+
## Using multiple services
2626

27-
Use an AsyncExitStack at the call site:
27+
Use an `AsyncExitStack` at the call site:
2828

29-
async with AsyncExitStack() as stack:
30-
await stack.enter_async_context(producer)
31-
await stack.enter_async_context(coordinator)
32-
# add more services as needed
33-
await asyncio.Event().wait()
29+
```python
30+
async with AsyncExitStack() as stack:
31+
await stack.enter_async_context(producer)
32+
await stack.enter_async_context(coordinator)
33+
# add more services as needed
34+
await asyncio.Event().wait()
35+
```
3436

35-
The stack starts services in the order theyre added and stops them in reverse. Thats usually what we want: consumers stop before producers flush, monitors stop before their publishers, etc.
37+
The stack starts services in the order they're added and stops them in reverse. That's usually what you want: consumers stop before producers flush, monitors stop before their publishers.
3638

37-
Trade‑offs
38-
39-
We gave up the “invisible” start‑in‑constructor convenience in return for correctness and clarity. The payoff is fewer shutdown bugs, fewer hidden dependencies, and far less boilerplate. The code is simpler to read and reason about because the lifetime of each component is explicit and managed by the language tools built for it.
39+
## Trade-offs
4040

41+
The "invisible" start-in-constructor convenience is gone in return for correctness and clarity. The payoff is fewer shutdown bugs, fewer hidden dependencies, and far less boilerplate. The code is simpler to read and reason about because the lifetime of each component is explicit and managed by the language tools built for it.

0 commit comments

Comments
 (0)