livepeer · ryanontheinside · Sep 11, 2025 · Sep 12, 2025 · Sep 12, 2025 · Sep 12, 2025
diff --git a/README.md b/README.md
@@ -369,6 +369,28 @@ stream.prepare(
 
 The delta has a moderating effect on the effectiveness of RCFG.
 
+## Additional Feature Documentation
+
+Comprehensive documentation is available in the [docs folder](src/streamdiffusion/docs/).
+
+- [Core Concepts](src/streamdiffusion/docs/hooks.md)
+- [Modules](src/streamdiffusion/docs/modules/)
+- [Preprocessing](src/streamdiffusion/docs/preprocessing/)
+- [Pipeline](src/streamdiffusion/docs/pipeline.md)
+- [Parameter Updater](src/streamdiffusion/docs/stream_parameter_updater.md)
+- [Wrapper](src/streamdiffusion/docs/wrapper.md)
+- [Config](src/streamdiffusion/docs/config.md)
+- [TensorRT](src/streamdiffusion/docs/acceleration/tensorrt.md)
+
+## Diagrams
+
+- [Architecture Overview](src/streamdiffusion/docs/diagrams/overall_architecture.md)
+- [Hooks Integration](src/streamdiffusion/docs/diagrams/hooks_integration.md)
+- [Orchestrator Flow](src/streamdiffusion/docs/diagrams/orchestrator_flow.md)
+- [Module Integration](src/streamdiffusion/docs/diagrams/module_integration.md)
+- [Parameter Updating](src/streamdiffusion/docs/diagrams/parameter_updating.md)
+- [TensorRT Pipeline](src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md)
+
 ## Development Team
 
 [Aki](https://twitter.com/cumulo_autumn),

diff --git a/src/streamdiffusion/docs/acceleration/tensorrt.md b/src/streamdiffusion/docs/acceleration/tensorrt.md
@@ -0,0 +1,31 @@
+# TensorRT Acceleration
+
+## Overview
+
+TensorRT acceleration optimizes StreamDiffusion for realtime performance by compiling PyTorch models to TensorRT engines, supporting dynamic batch/resolution (384-1024), FP16, and CUDA graphs. Engines are built for UNet, VAE (encoder/decoder), ControlNet, Safety Checker. The system auto-fallbacks to PyTorch on OOM, with engine pooling for ControlNet.
+
+Key components:
+- **EngineBuilder**: Exports ONNX, optimizes, builds TRT (static/dynamic shapes).
+- **EngineManager**: Manages paths, compiles/loads engines (UNet/VAE/ControlNet).
+- **Runtime Engines**: UNet2DConditionModelEngine, AutoencoderKLEngine, ControlNetModelEngine (infer with shape cache).
+- **Export Wrappers**: UnifiedExportWrapper for UNet+ControlNet+IPAdapter (handles kwargs, scales).
+- **Utilities**: Engine class (buffers, infer), preprocess/decode helpers.
+
+Files: [`builder.py`](../../../acceleration/tensorrt/builder.py), [`engine_manager.py`](../../../acceleration/tensorrt/engine_manager.py), [`utilities.py`](../../../acceleration/tensorrt/utilities.py), wrappers in `export_wrappers/`.
+
+## Usage
+
+### Engine Building
+
+Use `EngineManager` in wrapper init (build_engines_if_missing=True):
+
+```python
+from streamdiffusion import StreamDiffusionWrapper
+
+wrapper = StreamDiffusionWrapper(
+    model_id_or_path="runwayml/stable-diffusion-v1-5",
+    acceleration="tensorrt",
+    engine_dir="engines",  # Output dir
+    build_engines_if_missing=True  # Compile if missing
+)
+# Builds: unet.engine, vae_encoder.engine, vae_decoder
diff --git a/src/streamdiffusion/docs/config.md b/src/streamdiffusion/docs/config.md
@@ -0,0 +1,44 @@
+# Config Management
+
+## Overview
+
+Config management in StreamDiffusion uses YAML/JSON files to define model, pipeline, blending, and module settings. The `config.py` module provides `load_config`/`save_config` for file I/O, validation for types/fields, and helpers like `create_wrapper_from_config` to instantiate `StreamDiffusionWrapper` from dicts. Supports legacy single prompts and new blending (prompt_list, seed_list), with normalization, interpolation methods.
+
+Key functions:
+- `load_config(path)`: Loads YAML/JSON, validates.
+- `save_config(config, path)`: Writes validated config.
+- `create_wrapper_from_config(config)`: Builds wrapper from dict, extracts params, handles blending.
+- `create_prompt_blending_config`/`create_seed_blending_config`: Helpers for blending.
+- `set_normalize_weights_config`: Sets normalization flags.
+- Validation: Ensures model_id, controlnets/ipadapters lists, hook processors (type, enabled, params), blending lists.
+
+Configs are loaded at startup; runtime updates via `update_stream_params` ([doc](../stream_parameter_updater.md)). Files: [`config.py`](../../../config.py).
+
+## File Format (YAML Example)
+
+```yaml
+model_id: "runwayml/stable-diffusion-v1-5"
+t_index_list: [0, 999]
+width: 512
+height: 512
+mode: "img2img"
+output_type: "pil"
+device: "cuda"
+dtype: "float16"
+use_controlnet: true
+controlnets:
+  - model_id: "lllyasviel/sd-controlnet-canny"
+    preprocessor: "canny"
+    conditioning_scale: 1.0
+    enabled: true
+    preprocessor_params:
+      threshold_low: 100
+      threshold_high: 200
+use_ipadapter: true
+ipadapters:
+  - ipadapter_model_path: "h94/IP-Adapter"
+    image_encoder_path: "openai/clip-vit-large-patch14"
+    scale: 0.8
+    type: "regular"
+prompt_blending:
+  prompt
diff --git a/src/streamdiffusion/docs/diagrams/hooks_integration.md b/src/streamdiffusion/docs/diagrams/hooks_integration.md
@@ -0,0 +1,13 @@
+# Hooks Integration
+
+```mermaid
+graph LR
+    A[Pipeline Stages] --> B[Embedding Hooks: Prompt Blending]
+    B --> C[UNet Hooks: ControlNet/IPAdapter]
+    C --> D[Orchestrator Calls: Processors]
+    D --> E[Latent/Image Hooks: Pre/Post Processing]
+
+    F[StreamParameterUpdater] -.->|Update Configs| C
+    G[Config] -->|Register Hooks| B
+    G -->|Register Hooks| C
+    G -->|Register Hooks| E
diff --git a/src/streamdiffusion/docs/diagrams/module_integration.md b/src/streamdiffusion/docs/diagrams/module_integration.md
@@ -0,0 +1,29 @@
+# Module Integration
+
+```mermaid
+graph TD
+    A[Input Image] --> B[Image Preprocessing Hooks]
+    B --> C[VAE Encode]
+    C --> D[Latent Preprocessing Hooks]
+    D --> E[UNet Forward]
+
+    E --> F{ControlNet Active?}
+    F -->|Yes| G[Add Residuals: Down/Mid Blocks]
+    F -->|No| H{IPAdapter Active?}
+    H -->|Yes| I[Set IPAdapter Scale Vector]
+    H -->|No| J[Standard UNet Call]
+    G --> J
+    I --> J
+
+    J --> K[Latent Postprocessing Hooks]
+    K --> L[VAE Decode]
+    L --> M[Image Postprocessing Hooks]
+    M --> N[Output Image]
+
+    O[StreamParameterUpdater] -.->|Update Scales| I
+    P[Config] -->|Enable Modules| F
+    P -->|Enable Modules| H
+    P -->|Enable Modules| B
+    P -->|Enable Modules| D
+    P -->|Enable Modules| K
+    P -->|Enable Modules| M
diff --git a/src/streamdiffusion/docs/diagrams/orchestrator_flow.md b/src/streamdiffusion/docs/diagrams/orchestrator_flow.md
@@ -0,0 +1,72 @@
+# Orchestrator Flow
+
+```mermaid
+graph TB
+    subgraph "Input Layer - Distinct Preprocessing Types"
+        A["ControlNet/IPAdapter Inputs: Raw Images for Module Preprocessing"]
+        B["Pipeline Hooks: Latent/Image Tensors for Hook Stages"]
+        C["Postprocessing: VAE Output Images for Enhancement"]
+    end
+
+    subgraph "PreprocessingOrchestrator (ControlNet/IPAdapter - Intraframe Parallelism)"
+        D["Raw Images: Multiple ControlNets/IPAdapters"]
+        E["Group by Processor Type: e.g., All Canny Processors Grouped"]
+        F["Intraframe Parallel: ThreadPoolExecutor per Group"]
+        F --> G["Process Group in Parallel: e.g., Canny for CN1 and CN2 Simultaneously"]
+        G --> H["Merge/Broadcast Group Results to Specific Modules e.g. Canny to CN1 and CN2"]
+        I["Intraframe Sequential: Unique Processors Single Thread"]
+        H --> J["Cache by Type: Reuse Across Modules/Frames"]
+        I --> J
+        J --> K["Output Distinct Tensors for Each ControlNet/IPAdapter"]
+    end
+
+    subgraph "PipelinePreprocessingOrchestrator (Hook Stages - Sequential Chain)"
+        L["Latent/Image Tensors from Pipeline Hooks"]
+        M["Sequential Chain: _execute_pipeline_chain"]
+        M --> N["Single Processor Application: e.g., Latent Feedback Sequential"]
+        N --> O["Next Processor in Order (order attr)"]
+        O --> P["Chain Continues: No Parallelism Within Chain"]
+        P --> M
+        Q["Output Processed Tensor to Next Pipeline Hook/Stage"]
+    end
+
+    subgraph "PostprocessingOrchestrator (Output - Cached Sequential)"
+        R["VAE Decoded Images"]
+        S["Sequential with Cache Check: _apply_single_postprocessor"]
+        S --> T{"Cache Hit for Identical Input?"}
+        T -->|Yes| U["Reuse Cached: e.g., Same Upscale Params"]
+        T -->|No| V["Process Sequential: Realesrgan_trt then Sharpen"]
+        U --> W["Output Enhanced Image"]
+        V --> W
+    end
+
+    subgraph "BaseOrchestrator (All Types - Interframe Pipelining)"
+        X{"Use Sync Processing? (Feedback/Temporal Config)"}
+        X -->|Yes| Y["Process Sync: Sequential/Immediate (No Lag, Low Throughput)"]
+        X -->|No| Z["Background Thread: Pipelined/1-Frame Lag (High Throughput)"]
+        Y --> AA["Apply Current Frame Results"]
+        Z --> AA
+        AA --> BB["Output to Pipeline/Next Orchestrator/Stage"]
+    end
+
+    subgraph "Shared Resources & Integration"
+        CC["OrchestratorUser Mixin: Attach Shared Orchestrators to Modules/Hooks"]
+        DD["StreamParameterUpdater: Runtime Param Updates to Processors"]
+        EE["Thread Lock: Ensure Thread-Safe Parallel & Pipelined Execution"]
+    end
+
+    A --> E
+    B --> M
+    C --> S
+    E --> X
+    M --> X
+    S --> X
+    CC -.->|"Shared Orchestrators"| E
+    CC -.->|"Shared Orchestrators"| M
+    CC -.->|"Shared Orchestrators"| S
+    DD -.->|"Dynamic Params"| E
+    DD -.->|"Dynamic Params"| M
+    DD -.->|"Dynamic Params"| S
+    EE -.->|"Protect"| F
+    EE -.->|"Protect"| M
+    EE -.->|"Protect"| S
diff --git a/src/streamdiffusion/docs/diagrams/overall_architecture.md b/src/streamdiffusion/docs/diagrams/overall_architecture.md
@@ -0,0 +1,60 @@
+# Overall Architecture
+
+```mermaid
+graph TB
+    subgraph "Input"
+        A["Input: Image/Prompt/Control Image"]
+    end
+
+    subgraph "Preprocessing"
+        B["Preprocessing Orchestrators"]
+        C["Processors: Edge Detection (Canny/HED), Pose (OpenPose), Depth (MiDaS)"]
+        D["Parallel Execution via ThreadPool"]
+    end
+
+    subgraph "Pipeline Core"
+        E["StreamDiffusion.prepare: Embeddings/Timesteps/Noise"]
+        F["UNet Steps with Hooks"]
+        G["ControlNet/IPAdapter Injection"]
+        H["Orchestrator Calls: Latent/Image Hooks"]
+    end
+
+    subgraph "Decoding"
+        I["VAE Decode"]
+        J["Postprocessing Orchestrators"]
+    end
+
+    subgraph "Output"
+        K["Output: Image"]
+    end
+
+    subgraph "Management"
+        L["StreamParameterUpdater: Blending/Caching"]
+        M["Config Loader: YAML/JSON"]
+    end
+
+    subgraph "Acceleration"
+        N["TensorRT Engines: UNet/VAE/ControlNet"]
+        O["Runtime Inference"]
+    end
+
+    A --> B
+    B --> C
+    C --> D
+    D --> E
+    E --> F
+    F --> G
+    G --> H
+    H --> I
+    I --> J
+    J --> K
+
+    L -.->|"Updates"| E
+    L -.->|"Updates"| F
+    M -.->|"Setup"| B
+    M -.->|"Setup"| J
+    M -.->|"Setup"| L
+    N -.->|"Optimized"| F
+    N -.->|"Optimized"| I
+    O -.->|"Fallback PyTorch"| F
+    O -.->|"Fallback PyTorch"| I
diff --git a/src/streamdiffusion/docs/diagrams/parameter_updating.md b/src/streamdiffusion/docs/diagrams/parameter_updating.md
@@ -0,0 +1,56 @@
+# Parameter Updating
+
+```mermaid
+graph TD
+    subgraph "Runtime Update Entry Point"
+        A["update_stream_params Call"]
+        A --> B["Thread Lock: _update_lock"]
+    end
+
+    subgraph "Parameter Branches"
+        B --> C{"Prompt List Provided?"}
+        C -->|Yes| D["_cache_prompt_embeddings: Cache/Encode Prompts"]
+        C -->|No| E{"Seed List Provided?"}
+        E -->|Yes| F["_cache_seed_noise: Cache/Generate Noise"]
+        E -->|No| G{"ControlNet Config Provided?"}
+        G -->|Yes| H["Diff Current vs Desired: Add/Remove/Update Scales/Enabled"]
+        H --> I["Update ControlNet Pipeline: reorder/add/remove/update_scale"]
+        G -->|No| J{"IPAdapter Config Provided?"}
+        J -->|Yes| K["Update Scale: Uniform or Per-Layer Vector"]
+        K --> L["Set Weight Type: Linear/SLERP for Layers/Steps"]
+        J -->|No| M{"Hook Config Provided? e.g., Image/Latent Pre/Post"}
+        M -->|Yes| N["Diff Current vs Desired: Modify/Add/Remove Processors In-Place"]
+        N --> O["Update Processor Params/Enabled/Order"]
+        M -->|No| P["Update Timestep/Resolution: Recalc Scalings/Batches"]
+    end
+
+    subgraph "Blending & Caching Layer"
+        D --> Q["_apply_prompt_blending: Linear/SLERP"]
+        F --> R["_apply_seed_blending: Linear/SLERP"]
+        I --> S["Cache Stats: Hits/Misses for Monitoring"]
+        L --> S
+        O --> S
+        P --> S
+        Q --> T["Update Pipeline Tensors: prompt_embeds/init_noise"]
+        R --> T
+        S --> T
+    end
+
+    subgraph "Pipeline Integration"
+        T --> U["Pipeline Uses Updated Tensors/Hooks"]
+    end
+
+    subgraph "Shared Utilities"
+        V["Normalize Weights: Sum to 1.0 (Optional)"]
+        W["Thread-Safe Lock: Prevent Race Conditions"]
+        X["Cache Reindexing: Handle Add/Remove"]
+    end
+
+    C -.->|"Use"| V
+    E -.->|"Use"| V
+    B -.->|"Protect"| W
+    D -.->|"Use"| X
+    F -.->|"Use"| X
+    H -.->|"Use"| X
+    J -.->|"Use"| X
+    M -.->|"Use"| X