You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/architecture/kafka-topic-architecture.md
+56Lines changed: 56 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,3 +55,59 @@ This architectural pattern also provides flexibility for future evolution. If we
55
55
We might also introduce more sophisticated processing stages. Perhaps certain executions need security scanning before processing, or we want to batch similar executions for efficiency. These additional stages can be inserted between execution_events and execution_tasks without disrupting existing consumers.
56
56
57
57
The pattern we've established here - separating event streams from task queues - can be applied to other domains in our system. If we add support for scheduled executions, we might have schedule_events for audit and schedule_tasks for the actual scheduling work. If we implement distributed training jobs, we might have training_events and training_tasks.
Sagas coordinate multi-step workflows where each step publishes commands to Kafka and the orchestrator tracks progress in MongoDB. If a step fails, compensation actions roll back previous steps by publishing compensating events. The saga pattern keeps long-running operations reliable without distributed transactions. Dependencies like producers and repositories are injected explicitly rather than pulled from context, and only serializable data gets persisted so sagas can resume after restarts.
80
+
81
+
## Event replay
82
+
83
+
```
84
+
/api/v1/replay/sessions (admin) --> ReplayService
85
+
| |
86
+
| |-- ReplayRepository (Mongo) for sessions
87
+
| |-- EventStore queries filters/time ranges
88
+
| |-- UnifiedProducer to Kafka (target topic)
89
+
v v
90
+
JSON summaries Kafka topics (private)
91
+
```
92
+
93
+
The replay system lets admins re-emit historical events from the EventStore back to Kafka. This is useful for rebuilding projections, testing new consumers, or recovering from data issues. You create a replay session with filters like time range or event type, and the ReplayService reads matching events from MongoDB and publishes them to the target topic. The session tracks progress so you can pause and resume long replays.
/api/v1/admin/events/* -> admin repos (Mongo) for events query/delete
100
+
```
101
+
102
+
When a consumer fails to process an event after multiple retries, the event lands in the dead letter queue topic. The DLQ manager handles retry logic with exponential backoff and configurable thresholds. Admins can inspect failed events through the API, fix the underlying issue, and replay them back to the original topic. Events that repeatedly fail can be manually deleted or archived after investigation.
All events are defined as Pydantic models with strict typing. The mappings module routes each event type to its destination topic. Schema Registry integration ensures producers and consumers agree on event structure, catching incompatible changes before they cause runtime failures. The unified producer and consumer classes handle serialization, error handling, and observability so individual services don't reinvent the wheel.
0 commit comments