kubeflow · sh4shv4t · Feb 4, 2026 · Feb 6, 2026
diff --git a/content/en/docs/components/trainer/user-guides/local-execution-mode/docker.md b/content/en/docs/components/trainer/user-guides/local-execution-mode/docker.md
@@ -15,6 +15,29 @@ The Container Backend with Docker enables you to run distributed TrainJobs in is
 
 The Docker backend uses the adapter pattern to provide a unified interface, making it easy to switch between Docker and Podman without code changes.
 
+## Architecture
+
+The Container Backend with Docker uses a local orchestration layer to manage TrainJobs within Docker containers. This ensures environment parity between your local machine and production Kubernetes clusters.
+
+```mermaid
+graph LR
+    User([User Script]) -->|TrainerClient.train| SDK[Kubeflow SDK]
+
+    SDK -->|1. Pull| Image[Docker Image]
+    SDK -->|2. Net| Net[Bridge Network]
+    SDK -->|3. Run| Daemon[Docker Daemon]
+
+    subgraph DockerEnv [Local Docker Environment]
+        direction TB
+        Daemon -->|Spawn| C1[Node 0]
+        Daemon -->|Spawn| C2[Node 1]
+        C1 <-->|DDP| C2
+    end
+
+    C1 -->|4. Logs| Logs[Stream Logs]
+    C1 -.->|5. Clean| Remove[Auto-Remove]
+```
+
 ## Prerequisites
 
 ### Required Software

diff --git a/...nt/en/docs/components/trainer/user-guides/local-execution-mode/local_process.md b/...nt/en/docs/components/trainer/user-guides/local-execution-mode/local_process.md
@@ -217,6 +217,28 @@ finally:
     client.delete_job(job_name)
 ```
 
+## Architecture
+
+The Local Process Backend operates by orchestrating native OS processes. It bypasses container runtimes like Docker, instead managing the lifecycle of your training script through isolated Python Virtual Environments (venvs).
+
+```mermaid
+graph LR
+    User([User Script]) -->|TrainerClient.train| SDK[Kubeflow SDK]
+
+    SDK -->|1. Create| Venv[Python Venv]
+    Venv -->|2. Install| Deps[Dependencies]
+    SDK -->|3. Extract| Script[Training Script .py]
+
+    subgraph LocalExec [Local Execution]
+        direction TB
+        Deps --> Process[Python Process]
+        Script --> Process
+    end
+
+    Process -->|4. Logs| Logs[Stream Logs]
+    Process -.->|5. Clean| Cleanup[Delete Venv]
+```
+
 ## How It Works
 
 Understanding the internal workflow helps with debugging and optimization:

diff --git a/content/en/docs/components/trainer/user-guides/local-execution-mode/podman.md b/content/en/docs/components/trainer/user-guides/local-execution-mode/podman.md
@@ -168,6 +168,37 @@ backend_config = ContainerBackendConfig(
 )
 ```
 
+## Architecture
+
+The Container Backend with Docker uses a local orchestration layer to manage TrainJobs within Docker containers. This ensures environment parity between your local machine and production Kubernetes clusters.
+
+```mermaid
+graph LR
+    User([User Script]) -->|TrainerClient.train| SDK[Kubeflow SDK]
+
+    SDK -->|1. Prep| PodConfig[Podman Config]
+    SDK -->|2. Mount| LocalDir[Local Dir Mounts]
+    SDK -->|3. Exec| Podman[Podman CLI/API]
+
+    subgraph PodmanEnv [Podman Container - Rootless]
+        direction TB
+        Podman --> Process[Training Process]
+        Process --> Security[User Namespace Isolation]
+    end
+
+    Process -->|4. Logs| Logs[Stream Logs]
+    Process -->|5. Clean| Exit[Exit & Cleanup]
+```
+
+
+
+### Workflow Detail
+1. **Image Management:** The SDK identifies the required training image. If `pull_policy` is set, it ensures the latest image is available.
+2. **Network Creation:** A dedicated Docker bridge network is created for the job to allow containers (nodes) to communicate via hostnames (e.g., `job-node-0`).
+3. **Container Spawning:** The SDK instructs the Docker Daemon to start containers. It injects environment variables like `MASTER_ADDR`, `MASTER_PORT`, `RANK`, and `WORLD_SIZE` to enable distributed frameworks (e.g., PyTorch DDP).
+4. **Log Streaming:** Logs are streamed from the containers back to the SDK's `TrainerClient`.
+5. **Lifecycle Management:** Once the training process exits, the SDK handles the removal of containers and the temporary network if `auto_remove=True`.
+
 ## Multi-Node Distributed Training
 
 The Podman backend automatically sets up networking and environment variables for distributed training: