HardMax71
diff --git a/‎.github/workflows/docs.yml‎
Lines changed: 5 additions & 83 deletions b/‎.github/workflows/docs.yml‎
Lines changed: 5 additions & 83 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎backend/docs/kafka-topic-architecture.md‎ ‎…architecture/kafka-topic-architecture.md‎backend/docs/kafka-topic-architecture.md renamed to docs/architecture/kafka-topic-architecture.md
Lines changed: 56 additions & 0 deletions b/‎backend/docs/kafka-topic-architecture.md‎ ‎…architecture/kafka-topic-architecture.md‎backend/docs/kafka-topic-architecture.md renamed to docs/architecture/kafka-topic-architecture.md
Lines changed: 56 additions & 0 deletions
diff --git a/‎backend/docs/lifecycle.md‎ ‎docs/architecture/lifecycle.md‎backend/docs/lifecycle.md renamed to docs/architecture/lifecycle.md b/‎backend/docs/lifecycle.md‎ ‎docs/architecture/lifecycle.md‎backend/docs/lifecycle.md renamed to docs/architecture/lifecycle.md
@@ -5,29 +5,21 @@ on:
     branches: [ main ]
     paths:
       - 'docs/**'
-      - 'backend/docs/**'
-      - 'files_for_readme/ARCHITECTURE_IN_DETAILS.md'
-      - 'backend/tests/load/README.md'
       - 'mkdocs.yml'
       - '.github/workflows/docs.yml'
   pull_request:
-    branches:
-      - main
+    branches: [ main ]
     paths:
       - 'docs/**'
-      - 'backend/docs/**'
-      - 'files_for_readme/ARCHITECTURE_IN_DETAILS.md'
-      - 'backend/tests/load/README.md'
       - 'mkdocs.yml'
       - '.github/workflows/docs.yml'
-  workflow_dispatch:  # Allow manual trigger
+  workflow_dispatch:
 
 permissions:
   contents: write
   pages: write
   id-token: write
 
-# Allow only one concurrent deployment
 concurrency:
   group: pages
   cancel-in-progress: false
@@ -38,8 +30,6 @@ jobs:
     steps:
       - name: Checkout repository
         uses: actions/checkout@v4
-        with:
-          fetch-depth: 0  # Full history for git info
 
       - name: Set up Python
         uses: actions/setup-python@v5
@@ -51,77 +41,10 @@ jobs:
         with:
           key: mkdocs-${{ hashFiles('mkdocs.yml') }}
           path: ~/.cache/pip
-          restore-keys: |
-            mkdocs-
-
-      - name: Install MkDocs and dependencies
-        run: |
-          pip install --upgrade pip
-          pip install \
-            mkdocs-material \
-            mkdocs-mermaid2-plugin \
-            pymdown-extensions
-
-      - name: Prepare documentation structure
-        run: |
-          # Create directory structure for docs
-          mkdir -p docs/architecture
-          mkdir -p docs/reference
-          mkdir -p docs/components/sse
-          mkdir -p docs/components/workers
-          mkdir -p docs/operations
-          mkdir -p docs/security
-          mkdir -p docs/testing
-          mkdir -p docs/stylesheets
-          mkdir -p docs/assets/images
-
-          # Copy images and assets
-          cp files_for_readme/*.png docs/architecture/
-          cp files_for_readme/*.svg docs/architecture/ 2>/dev/null || true
-          cp files_for_readme/logo.png docs/assets/images/
-
-          # Copy architecture docs
-          cp files_for_readme/ARCHITECTURE_IN_DETAILS.md docs/architecture/overview.md
-          cp backend/docs/services-overview.md docs/architecture/
-          cp backend/docs/kafka-topic-architecture.md docs/architecture/
-          cp backend/docs/lifecycle.md docs/architecture/
-
-          # Copy API reference
-          cp backend/docs/api-reference.md docs/reference/
-
-          # Copy component docs - services overview
-          cp backend/docs/services-overview.md docs/components/
-
-          # Copy SSE docs
-          cp backend/docs/sse-partitioned-architecture.md docs/components/sse/
-          cp backend/docs/execution-sse-flow.md docs/components/sse/
-          cp backend/docs/workers/sse-architecture.md docs/components/sse/
-
-          # Copy worker docs
-          cp backend/docs/workers/pod_monitor.md docs/components/workers/
-          cp backend/docs/workers/result_processor.md docs/components/workers/
-
-          # Copy other component docs
-          cp backend/docs/dead-letter-queue.md docs/components/
-          cp backend/docs/schema-manager.md docs/components/
-
-          # Copy operations docs
-          cp backend/docs/tracing.md docs/operations/
-          cp backend/docs/metrics-contextvars.md docs/operations/
-          cp backend/docs/cpu-time-measurement.md docs/operations/
-          cp backend/docs/notification-types.md docs/operations/
-          cp backend/docs/troubleshooting-result-processor-di-crash.md docs/operations/
-
-          # Copy security docs
-          cp backend/docs/security/policies.md docs/security/
-
-          # Copy testing docs
-          cp backend/tests/load/README.md docs/testing/load-testing.md
+          restore-keys: mkdocs-
 
-          # Create minimal extra.css if not exists
-          if [ ! -f docs/stylesheets/extra.css ]; then
-            echo "/* Custom styles for Integr8sCode docs */" > docs/stylesheets/extra.css
-          fi
+      - name: Install MkDocs
+        run: pip install mkdocs-material mkdocs-mermaid2-plugin
 
       - name: Build documentation
         run: mkdocs build --strict
@@ -132,7 +55,6 @@ jobs:
           path: site/
 
   deploy:
-    # Only deploy on push to main (not PRs)
     if: github.event_name == 'push' && github.ref == 'refs/heads/main'
     needs: build
     runs-on: ubuntu-latest
 
@@ -1,5 +1,5 @@
 <p align="center">
- <img src="./files_for_readme/logo.png" alt="Integr8sCode Logo" width="250" height="250">
+ <img src="./docs/assets/images/logo.png" alt="Integr8sCode Logo" width="250" height="250">
  <h1 align="center"><b>Integr8sCode</b></h1>
 </p>
 <p align="center">
@@ -128,7 +128,7 @@ cause `match-case` was introduced first in `Python 3.10`.
 > [!TIP]
 > Full documentation is available at https://hardmax71.github.io/Integr8sCode/
 
-<img src="./files_for_readme/system_diagram.png" alt="system diagram">
+<img src="./docs/assets/images/system_diagram.png" alt="system diagram">
 
 The platform is built on three main pillars:
 
 
@@ -55,3 +55,59 @@ This architectural pattern also provides flexibility for future evolution. If we
 We might also introduce more sophisticated processing stages. Perhaps certain executions need security scanning before processing, or we want to batch similar executions for efficiency. These additional stages can be inserted between execution_events and execution_tasks without disrupting existing consumers.
 
 The pattern we've established here - separating event streams from task queues - can be applied to other domains in our system. If we add support for scheduled executions, we might have schedule_events for audit and schedule_tasks for the actual scheduling work. If we implement distributed training jobs, we might have training_events and training_tasks.
+
+## Saga orchestration
+
+```mermaid
+graph TD
+    SagaService[SagaService]
+    Orchestrator[SagaOrchestrator]
+    ExecutionSaga["ExecutionSaga<br/>(steps/compensations)"]
+    SagaRepo[(SagaRepository<br/>Mongo)]
+    EventStore[(EventStore + Kafka topics)]
+
+    SagaService -- starts --> Orchestrator
+    SagaService --> SagaRepo
+
+    Orchestrator -- "binds explicit dependencies<br/>(producers, repos, command publisher)" --> ExecutionSaga
+    Orchestrator --> EventStore
+
+    ExecutionSaga -- "step.run(...) -> publish commands (Kafka)" --> EventStore
+    ExecutionSaga -- "compensation() -> publish compensations" --> EventStore
+```
+
+Sagas coordinate multi-step workflows where each step publishes commands to Kafka and the orchestrator tracks progress in MongoDB. If a step fails, compensation actions roll back previous steps by publishing compensating events. The saga pattern keeps long-running operations reliable without distributed transactions. Dependencies like producers and repositories are injected explicitly rather than pulled from context, and only serializable data gets persisted so sagas can resume after restarts.
+
+## Event replay
+
+```
+  /api/v1/replay/sessions (admin) --> ReplayService
+         |                               |
+         |                               |-- ReplayRepository (Mongo) for sessions
+         |                               |-- EventStore queries filters/time ranges
+         |                               |-- UnifiedProducer to Kafka (target topic)
+         v                               v
+    JSON summaries                    Kafka topics (private)
+```
+
+The replay system lets admins re-emit historical events from the EventStore back to Kafka. This is useful for rebuilding projections, testing new consumers, or recovering from data issues. You create a replay session with filters like time range or event type, and the ReplayService reads matching events from MongoDB and publishes them to the target topic. The session tracks progress so you can pause and resume long replays.
+
+## Dead letter queue
+
+```
+  Kafka DLQ topic <-> DLQ manager (retry/backoff, thresholds)
+  /api/v1/admin/events/* -> admin repos (Mongo) for events query/delete
+```
+
+When a consumer fails to process an event after multiple retries, the event lands in the dead letter queue topic. The DLQ manager handles retry logic with exponential backoff and configurable thresholds. Admins can inspect failed events through the API, fix the underlying issue, and replay them back to the original topic. Events that repeatedly fail can be manually deleted or archived after investigation.
+
+## Topic and event structure
+
+```
+infrastructure/kafka/events/* : Pydantic event models
+infrastructure/kafka/mappings.py: event -> topic mapping
+events/schema/schema_registry.py: schema manager
+events/core/{producer,consumer,dispatcher}.py: unified Kafka plumbing
+```
+
+All events are defined as Pydantic models with strict typing. The mappings module routes each event type to its destination topic. Schema Registry integration ensures producers and consumers agree on event structure, catching incompatible changes before they cause runtime failures. The unified producer and consumer classes handle serialization, error handling, and observability so individual services don't reinvent the wheel.