Skip to content

Commit a543ee6

Browse files
committed
- moved all docs to /docs
- updated jobs and such - fixed arch overview page
1 parent 3747af3 commit a543ee6

29 files changed

+461
-477
lines changed

.github/workflows/docs.yml

Lines changed: 5 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -5,29 +5,21 @@ on:
55
branches: [ main ]
66
paths:
77
- 'docs/**'
8-
- 'backend/docs/**'
9-
- 'files_for_readme/ARCHITECTURE_IN_DETAILS.md'
10-
- 'backend/tests/load/README.md'
118
- 'mkdocs.yml'
129
- '.github/workflows/docs.yml'
1310
pull_request:
14-
branches:
15-
- main
11+
branches: [ main ]
1612
paths:
1713
- 'docs/**'
18-
- 'backend/docs/**'
19-
- 'files_for_readme/ARCHITECTURE_IN_DETAILS.md'
20-
- 'backend/tests/load/README.md'
2114
- 'mkdocs.yml'
2215
- '.github/workflows/docs.yml'
23-
workflow_dispatch: # Allow manual trigger
16+
workflow_dispatch:
2417

2518
permissions:
2619
contents: write
2720
pages: write
2821
id-token: write
2922

30-
# Allow only one concurrent deployment
3123
concurrency:
3224
group: pages
3325
cancel-in-progress: false
@@ -38,8 +30,6 @@ jobs:
3830
steps:
3931
- name: Checkout repository
4032
uses: actions/checkout@v4
41-
with:
42-
fetch-depth: 0 # Full history for git info
4333

4434
- name: Set up Python
4535
uses: actions/setup-python@v5
@@ -51,77 +41,10 @@ jobs:
5141
with:
5242
key: mkdocs-${{ hashFiles('mkdocs.yml') }}
5343
path: ~/.cache/pip
54-
restore-keys: |
55-
mkdocs-
56-
57-
- name: Install MkDocs and dependencies
58-
run: |
59-
pip install --upgrade pip
60-
pip install \
61-
mkdocs-material \
62-
mkdocs-mermaid2-plugin \
63-
pymdown-extensions
64-
65-
- name: Prepare documentation structure
66-
run: |
67-
# Create directory structure for docs
68-
mkdir -p docs/architecture
69-
mkdir -p docs/reference
70-
mkdir -p docs/components/sse
71-
mkdir -p docs/components/workers
72-
mkdir -p docs/operations
73-
mkdir -p docs/security
74-
mkdir -p docs/testing
75-
mkdir -p docs/stylesheets
76-
mkdir -p docs/assets/images
77-
78-
# Copy images and assets
79-
cp files_for_readme/*.png docs/architecture/
80-
cp files_for_readme/*.svg docs/architecture/ 2>/dev/null || true
81-
cp files_for_readme/logo.png docs/assets/images/
82-
83-
# Copy architecture docs
84-
cp files_for_readme/ARCHITECTURE_IN_DETAILS.md docs/architecture/overview.md
85-
cp backend/docs/services-overview.md docs/architecture/
86-
cp backend/docs/kafka-topic-architecture.md docs/architecture/
87-
cp backend/docs/lifecycle.md docs/architecture/
88-
89-
# Copy API reference
90-
cp backend/docs/api-reference.md docs/reference/
91-
92-
# Copy component docs - services overview
93-
cp backend/docs/services-overview.md docs/components/
94-
95-
# Copy SSE docs
96-
cp backend/docs/sse-partitioned-architecture.md docs/components/sse/
97-
cp backend/docs/execution-sse-flow.md docs/components/sse/
98-
cp backend/docs/workers/sse-architecture.md docs/components/sse/
99-
100-
# Copy worker docs
101-
cp backend/docs/workers/pod_monitor.md docs/components/workers/
102-
cp backend/docs/workers/result_processor.md docs/components/workers/
103-
104-
# Copy other component docs
105-
cp backend/docs/dead-letter-queue.md docs/components/
106-
cp backend/docs/schema-manager.md docs/components/
107-
108-
# Copy operations docs
109-
cp backend/docs/tracing.md docs/operations/
110-
cp backend/docs/metrics-contextvars.md docs/operations/
111-
cp backend/docs/cpu-time-measurement.md docs/operations/
112-
cp backend/docs/notification-types.md docs/operations/
113-
cp backend/docs/troubleshooting-result-processor-di-crash.md docs/operations/
114-
115-
# Copy security docs
116-
cp backend/docs/security/policies.md docs/security/
117-
118-
# Copy testing docs
119-
cp backend/tests/load/README.md docs/testing/load-testing.md
44+
restore-keys: mkdocs-
12045

121-
# Create minimal extra.css if not exists
122-
if [ ! -f docs/stylesheets/extra.css ]; then
123-
echo "/* Custom styles for Integr8sCode docs */" > docs/stylesheets/extra.css
124-
fi
46+
- name: Install MkDocs
47+
run: pip install mkdocs-material mkdocs-mermaid2-plugin
12548

12649
- name: Build documentation
12750
run: mkdocs build --strict
@@ -132,7 +55,6 @@ jobs:
13255
path: site/
13356

13457
deploy:
135-
# Only deploy on push to main (not PRs)
13658
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
13759
needs: build
13860
runs-on: ubuntu-latest

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<img src="./files_for_readme/logo.png" alt="Integr8sCode Logo" width="250" height="250">
2+
<img src="./docs/assets/images/logo.png" alt="Integr8sCode Logo" width="250" height="250">
33
<h1 align="center"><b>Integr8sCode</b></h1>
44
</p>
55
<p align="center">
@@ -128,7 +128,7 @@ cause `match-case` was introduced first in `Python 3.10`.
128128
> [!TIP]
129129
> Full documentation is available at https://hardmax71.github.io/Integr8sCode/
130130
131-
<img src="./files_for_readme/system_diagram.png" alt="system diagram">
131+
<img src="./docs/assets/images/system_diagram.png" alt="system diagram">
132132

133133
The platform is built on three main pillars:
134134

backend/docs/kafka-topic-architecture.md renamed to docs/architecture/kafka-topic-architecture.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,59 @@ This architectural pattern also provides flexibility for future evolution. If we
5555
We might also introduce more sophisticated processing stages. Perhaps certain executions need security scanning before processing, or we want to batch similar executions for efficiency. These additional stages can be inserted between execution_events and execution_tasks without disrupting existing consumers.
5656

5757
The pattern we've established here - separating event streams from task queues - can be applied to other domains in our system. If we add support for scheduled executions, we might have schedule_events for audit and schedule_tasks for the actual scheduling work. If we implement distributed training jobs, we might have training_events and training_tasks.
58+
59+
## Saga orchestration
60+
61+
```mermaid
62+
graph TD
63+
SagaService[SagaService]
64+
Orchestrator[SagaOrchestrator]
65+
ExecutionSaga["ExecutionSaga<br/>(steps/compensations)"]
66+
SagaRepo[(SagaRepository<br/>Mongo)]
67+
EventStore[(EventStore + Kafka topics)]
68+
69+
SagaService -- starts --> Orchestrator
70+
SagaService --> SagaRepo
71+
72+
Orchestrator -- "binds explicit dependencies<br/>(producers, repos, command publisher)" --> ExecutionSaga
73+
Orchestrator --> EventStore
74+
75+
ExecutionSaga -- "step.run(...) -> publish commands (Kafka)" --> EventStore
76+
ExecutionSaga -- "compensation() -> publish compensations" --> EventStore
77+
```
78+
79+
Sagas coordinate multi-step workflows where each step publishes commands to Kafka and the orchestrator tracks progress in MongoDB. If a step fails, compensation actions roll back previous steps by publishing compensating events. The saga pattern keeps long-running operations reliable without distributed transactions. Dependencies like producers and repositories are injected explicitly rather than pulled from context, and only serializable data gets persisted so sagas can resume after restarts.
80+
81+
## Event replay
82+
83+
```
84+
/api/v1/replay/sessions (admin) --> ReplayService
85+
| |
86+
| |-- ReplayRepository (Mongo) for sessions
87+
| |-- EventStore queries filters/time ranges
88+
| |-- UnifiedProducer to Kafka (target topic)
89+
v v
90+
JSON summaries Kafka topics (private)
91+
```
92+
93+
The replay system lets admins re-emit historical events from the EventStore back to Kafka. This is useful for rebuilding projections, testing new consumers, or recovering from data issues. You create a replay session with filters like time range or event type, and the ReplayService reads matching events from MongoDB and publishes them to the target topic. The session tracks progress so you can pause and resume long replays.
94+
95+
## Dead letter queue
96+
97+
```
98+
Kafka DLQ topic <-> DLQ manager (retry/backoff, thresholds)
99+
/api/v1/admin/events/* -> admin repos (Mongo) for events query/delete
100+
```
101+
102+
When a consumer fails to process an event after multiple retries, the event lands in the dead letter queue topic. The DLQ manager handles retry logic with exponential backoff and configurable thresholds. Admins can inspect failed events through the API, fix the underlying issue, and replay them back to the original topic. Events that repeatedly fail can be manually deleted or archived after investigation.
103+
104+
## Topic and event structure
105+
106+
```
107+
infrastructure/kafka/events/* : Pydantic event models
108+
infrastructure/kafka/mappings.py: event -> topic mapping
109+
events/schema/schema_registry.py: schema manager
110+
events/core/{producer,consumer,dispatcher}.py: unified Kafka plumbing
111+
```
112+
113+
All events are defined as Pydantic models with strict typing. The mappings module routes each event type to its destination topic. Schema Registry integration ensures producers and consumers agree on event structure, catching incompatible changes before they cause runtime failures. The unified producer and consumer classes handle serialization, error handling, and observability so individual services don't reinvent the wheel.
File renamed without changes.

0 commit comments

Comments
 (0)