Skip to content

Commit ed8762b

Browse files
authored
chore: better documentation about workers (#61)
* added more docs
1 parent 47f6716 commit ed8762b

23 files changed

+1072
-44
lines changed

docs/architecture/authentication.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ JWTs are signed with HS256 using a secret key from settings:
6060
```
6161

6262
The token payload contains the username in the `sub` claim and an expiration time. Token lifetime is configured via
63-
`ACCESS_TOKEN_EXPIRE_MINUTES` (default: 30 minutes).
63+
`ACCESS_TOKEN_EXPIRE_MINUTES` (default: 24 hours / 1440 minutes).
6464

6565
### CSRF Validation
6666

docs/architecture/overview.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
11
# Architecture overview
22

33
In this file, you can find broad description of main components of project architecture.
4-
Preciser info about peculiarities of separate components (SSE, Kafka topics, DLQ, ..) are in
5-
the [Components](../components/dead-letter-queue.md) section.
4+
For details on specific components, see:
5+
6+
- [SSE Architecture](../components/sse/sse-architecture.md)
7+
- [Dead Letter Queue](../components/dead-letter-queue.md)
8+
- [Kafka Topics](kafka-topic-architecture.md)
9+
- [Workers](../components/workers/pod_monitor.md)
10+
11+
!!! note "Event Streaming"
12+
Kafka event streaming is disabled by default (`ENABLE_EVENT_STREAMING=false`). Set this to `true` in your
13+
environment to enable the full event-driven architecture.
614

715
## System overview
816

docs/architecture/rate-limiting.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,10 @@ The platform ships with default rate limits organized by endpoint group. Higher
6666
Execution endpoints have the strictest limits since they spawn Kubernetes pods. The catch-all API rule (priority 1)
6767
applies to any endpoint not matching a more specific pattern.
6868

69+
!!! note "WebSocket rule"
70+
The `/api/v1/ws` pattern is reserved for future WebSocket support. The platform currently uses Server-Sent Events
71+
(SSE) for real-time updates via `/api/v1/events/*`.
72+
6973
## Middleware Integration
7074

7175
The `RateLimitMiddleware` intercepts all HTTP requests, extracts the user identifier, and checks against the configured
@@ -135,10 +139,14 @@ Configuration is cached in Redis for 5 minutes to reduce database load while all
135139

136140
Rate limiting is controlled by environment variables:
137141

138-
| Variable | Default | Description |
139-
|---------------------------|--------------|---------------------------------------|
140-
| `RATE_LIMIT_ENABLED` | `true` | Enable/disable rate limiting globally |
141-
| `RATE_LIMIT_REDIS_PREFIX` | `ratelimit:` | Redis key prefix for isolation |
142+
| Variable | Default | Description |
143+
|---------------------------|------------------|------------------------------------------------------|
144+
| `RATE_LIMIT_ENABLED` | `true` | Enable/disable rate limiting globally |
145+
| `RATE_LIMIT_REDIS_PREFIX` | `rate_limit:` | Redis key prefix for isolation |
146+
| `RATE_LIMIT_ALGORITHM` | `sliding_window` | Algorithm to use (`sliding_window` or `token_bucket`)|
147+
| `RATE_LIMIT_DEFAULT_REQUESTS` | `100` | Default request limit |
148+
| `RATE_LIMIT_DEFAULT_WINDOW` | `60` | Default window in seconds |
149+
| `RATE_LIMIT_BURST_MULTIPLIER` | `1.5` | Burst multiplier for token bucket |
142150

143151
The system gracefully degrades when Redis is unavailable—requests are allowed through rather than failing closed.
144152

docs/architecture/runtime-registry.md

Lines changed: 38 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -77,23 +77,50 @@ The example scripts intentionally use features that may not work on older versio
7777
compatibility. For instance, Python's match statement (3.10+), Node's `Promise.withResolvers()` (22+), and Go's
7878
`clear()` function (1.21+).
7979

80-
## API Endpoint
80+
## API Endpoints
8181

82-
The `/api/v1/languages` endpoint returns the available runtimes:
82+
The runtime information is available via two endpoints:
83+
84+
### GET /api/v1/k8s-limits
85+
86+
Returns resource limits and supported runtimes:
87+
88+
```json
89+
{
90+
"cpu_limit": "1000m",
91+
"memory_limit": "128Mi",
92+
"cpu_request": "1000m",
93+
"memory_request": "128Mi",
94+
"execution_timeout": 300,
95+
"supported_runtimes": {
96+
"python": {"versions": ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12"], "file_ext": "py"},
97+
"node": {"versions": ["18", "20", "22"], "file_ext": "js"},
98+
"ruby": {"versions": ["3.1", "3.2", "3.3"], "file_ext": "rb"},
99+
"go": {"versions": ["1.20", "1.21", "1.22"], "file_ext": "go"},
100+
"bash": {"versions": ["5.1", "5.2", "5.3"], "file_ext": "sh"}
101+
}
102+
}
103+
```
104+
105+
### GET /api/v1/example-scripts
106+
107+
Returns example scripts for each language:
83108

84109
```json
85110
{
86-
"python": {"versions": ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12"], "file_ext": "py"},
87-
"node": {"versions": ["18", "20", "22"], "file_ext": "js"},
88-
"ruby": {"versions": ["3.1", "3.2", "3.3"], "file_ext": "rb"},
89-
"go": {"versions": ["1.20", "1.21", "1.22"], "file_ext": "go"},
90-
"bash": {"versions": ["5.1", "5.2", "5.3"], "file_ext": "sh"}
111+
"scripts": {
112+
"python": "# Python example script...",
113+
"node": "// Node.js example script...",
114+
"ruby": "# Ruby example script...",
115+
"go": "package main...",
116+
"bash": "#!/bin/bash..."
117+
}
91118
}
92119
```
93120

94121
## Key Files
95122

96-
| File | Purpose |
97-
|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|
98-
| [`runtime_registry.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/runtime_registry.py) | Language specifications and runtime config generation |
99-
| [`api/routes/languages.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/api/routes/languages.py) | API endpoint for available languages |
123+
| File | Purpose |
124+
|----------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|
125+
| [`runtime_registry.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/runtime_registry.py) | Language specifications and runtime config generation |
126+
| [`api/routes/execution.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/api/routes/execution.py) | API endpoints including k8s-limits and example-scripts |

docs/components/dead-letter-queue.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,11 @@ If sending to DLQ fails (extremely rare - would mean Kafka is down), the produce
8888

8989
The system is designed to be resilient but not perfect. In catastrophic scenarios, you still have Kafka's built-in durability and the ability to replay topics from the beginning if needed.
9090

91+
## Key files
92+
93+
| File | Purpose |
94+
|--------------------------------------------------------------------------------------------------------------------------|------------------------|
95+
| [`dlq_processor.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/workers/dlq_processor.py) | DLQ processor worker |
96+
| [`manager.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/dlq/manager.py) | DLQ management logic |
97+
| [`unified_producer.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/events/unified_producer.py) | `send_to_dlq()` method |
98+
| [`dlq.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/api/routes/dlq.py) | Admin API routes |

docs/components/saga/resource-allocation.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,10 @@ if active_count >= 100: # <- adjust this value
104104
```
105105

106106
Future improvements could make this configurable per-language or dynamically adjustable based on cluster capacity.
107+
108+
## Key files
109+
110+
| File | Purpose |
111+
|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|
112+
| [`execution_saga.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/saga/execution_saga.py) | Saga with allocation step |
113+
| [`resource_allocation_repository.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/db/repositories/resource_allocation_repository.py) | MongoDB operations |

docs/components/schema-manager.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,11 @@ During API startup, the `lifespan` function in `dishka_lifespan.py` gets the dat
3333
To force a specific MongoDB migration to run again, delete its document from `schema_versions`. To start fresh, point the app at a new database. Migrations are designed to be additive; the system doesn't support automatic rollbacks. If you need to undo a migration in production, you'll have to drop indexes or modify validators manually.
3434

3535
For Kafka schemas, the registry keeps all versions. If you break compatibility and need to start over, delete the subject from the registry (either via REST API or the registry's UI if available) and let the app re-register on next startup.
36+
37+
## Key files
38+
39+
| File | Purpose |
40+
|--------------------------------------------------------------------------------------------------------------------------------|----------------------------|
41+
| [`schema_manager.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/db/schema/schema_manager.py) | MongoDB migrations |
42+
| [`schema_registry.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/events/schema/schema_registry.py) | Kafka Avro serialization |
43+
| [`dishka_lifespan.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/dishka_lifespan.py) | Startup initialization |

docs/components/sse/execution-sse-flow.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,10 @@ The SSE router maintains a small pool of Kafka consumers and routes only the eve
99
Using `result_stored` as the terminal signal removes artificial waiting. Earlier iterations ended the SSE stream on `execution_completed`/`failed`/`timeout` and slept on the server to "give Mongo time" to commit. That pause is unnecessary once the stream ends only after the result processor confirms persistence.
1010

1111
This approach preserves clean attribution and ordering. The coordinator enriches pod creation commands with user information so pods are labeled correctly. The pod monitor converts Kubernetes phases into domain events. Timeout classification is deterministic: any pod finishing with `reason=DeadlineExceeded` results in an `execution_timeout` event. The result processor is the single writer of terminal state, so the UI never races the database — when the browser sees `result_stored`, the result is already present.
12+
13+
## Related docs
14+
15+
- [SSE Architecture](sse-architecture.md) — overall SSE design and components
16+
- [SSE Partitioned Router](sse-partitioned-architecture.md) — consumer pool and scaling
17+
- [Result Processor](../workers/result_processor.md) — terminal event handling
18+
- [Pod Monitor](../workers/pod_monitor.md) — Kubernetes event translation

docs/components/sse/sse-architecture.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,12 @@ Key metrics: `sse.connections.active`, `sse.messages.sent.total`, `sse.connectio
8282
## Why not WebSockets?
8383

8484
WebSockets were initially implemented but removed because SSE is sufficient for server-to-client communication, simpler connection management, better proxy compatibility (many corporate proxies block WebSockets), excellent browser support with automatic reconnection, and works great with HTTP/2 multiplexing.
85+
86+
## Key files
87+
88+
| File | Purpose |
89+
|-----------------------------------------------------------------------------------------------------------------------------------|------------------------|
90+
| [`sse_service.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/sse/sse_service.py) | Client connections |
91+
| [`kafka_redis_bridge.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/sse/kafka_redis_bridge.py) | Kafka-to-Redis routing |
92+
| [`redis_bus.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/sse/redis_bus.py) | Redis pub/sub wrapper |
93+
| [`sse_shutdown_manager.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/sse/sse_shutdown_manager.py) | Graceful shutdown |

docs/components/sse/sse-partitioned-architecture.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,10 @@ Memory management uses configurable buffer limits: max size, TTL for expiration,
3333
The `SSEShutdownManager` complements the router. The router handles the data plane (Kafka consumers, event routing to Redis). The shutdown manager handles the control plane (tracking connections, coordinating graceful shutdown, notifying clients).
3434

3535
When the server shuts down, SSE clients receive a shutdown event so they can display messages and attempt reconnection. The shutdown manager implements phased shutdown: notify clients, wait for graceful disconnection, force-close remaining connections. SSE connections register with the shutdown manager and monitor a shutdown event while streaming from Redis. When shutdown triggers, connections send shutdown messages and close gracefully.
36+
37+
## Key files
38+
39+
| File | Purpose |
40+
|-----------------------------------------------------------------------------------------------------------------------------------|----------------------|
41+
| [`kafka_redis_bridge.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/sse/kafka_redis_bridge.py) | Consumer pool router |
42+
| [`sse_shutdown_manager.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/sse/sse_shutdown_manager.py) | Connection tracking |

0 commit comments

Comments
 (0)