Skip to content
This repository was archived by the owner on Dec 26, 2025. It is now read-only.
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,28 @@ stream.prepare(

The delta has a moderating effect on the effectiveness of RCFG.

## Additional Feature Documentation

Comprehensive documentation is available in the [docs folder](src/streamdiffusion/docs/).

- [Core Concepts](src/streamdiffusion/docs/hooks.md)
- [Modules](src/streamdiffusion/docs/modules/)
- [Preprocessing](src/streamdiffusion/docs/preprocessing/)
- [Pipeline](src/streamdiffusion/docs/pipeline.md)
- [Parameter Updater](src/streamdiffusion/docs/stream_parameter_updater.md)
- [Wrapper](src/streamdiffusion/docs/wrapper.md)
- [Config](src/streamdiffusion/docs/config.md)
- [TensorRT](src/streamdiffusion/docs/acceleration/tensorrt.md)

## Diagrams

- [Architecture Overview](src/streamdiffusion/docs/diagrams/overall_architecture.md)
- [Hooks Integration](src/streamdiffusion/docs/diagrams/hooks_integration.md)
- [Orchestrator Flow](src/streamdiffusion/docs/diagrams/orchestrator_flow.md)
- [Module Integration](src/streamdiffusion/docs/diagrams/module_integration.md)
- [Parameter Updating](src/streamdiffusion/docs/diagrams/parameter_updating.md)
- [TensorRT Pipeline](src/streamdiffusion/docs/diagrams/tensorrt_pipeline.md)

## Development Team

[Aki](https://twitter.com/cumulo_autumn),
Expand Down
31 changes: 31 additions & 0 deletions src/streamdiffusion/docs/acceleration/tensorrt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# TensorRT Acceleration

## Overview

TensorRT acceleration optimizes StreamDiffusion for realtime performance by compiling PyTorch models to TensorRT engines, supporting dynamic batch/resolution (384-1024), FP16, and CUDA graphs. Engines are built for UNet, VAE (encoder/decoder), ControlNet, Safety Checker. The system auto-fallbacks to PyTorch on OOM, with engine pooling for ControlNet.

Key components:
- **EngineBuilder**: Exports ONNX, optimizes, builds TRT (static/dynamic shapes).
- **EngineManager**: Manages paths, compiles/loads engines (UNet/VAE/ControlNet).
- **Runtime Engines**: UNet2DConditionModelEngine, AutoencoderKLEngine, ControlNetModelEngine (infer with shape cache).
- **Export Wrappers**: UnifiedExportWrapper for UNet+ControlNet+IPAdapter (handles kwargs, scales).
- **Utilities**: Engine class (buffers, infer), preprocess/decode helpers.

Files: [`builder.py`](../../../acceleration/tensorrt/builder.py), [`engine_manager.py`](../../../acceleration/tensorrt/engine_manager.py), [`utilities.py`](../../../acceleration/tensorrt/utilities.py), wrappers in `export_wrappers/`.

## Usage

### Engine Building

Use `EngineManager` in wrapper init (build_engines_if_missing=True):

```python
from streamdiffusion import StreamDiffusionWrapper

wrapper = StreamDiffusionWrapper(
model_id_or_path="runwayml/stable-diffusion-v1-5",
acceleration="tensorrt",
engine_dir="engines", # Output dir
build_engines_if_missing=True # Compile if missing
)
# Builds: unet.engine, vae_encoder.engine, vae_decoder
44 changes: 44 additions & 0 deletions src/streamdiffusion/docs/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Config Management

## Overview

Config management in StreamDiffusion uses YAML/JSON files to define model, pipeline, blending, and module settings. The `config.py` module provides `load_config`/`save_config` for file I/O, validation for types/fields, and helpers like `create_wrapper_from_config` to instantiate `StreamDiffusionWrapper` from dicts. Supports legacy single prompts and new blending (prompt_list, seed_list), with normalization, interpolation methods.

Key functions:
- `load_config(path)`: Loads YAML/JSON, validates.
- `save_config(config, path)`: Writes validated config.
- `create_wrapper_from_config(config)`: Builds wrapper from dict, extracts params, handles blending.
- `create_prompt_blending_config`/`create_seed_blending_config`: Helpers for blending.
- `set_normalize_weights_config`: Sets normalization flags.
- Validation: Ensures model_id, controlnets/ipadapters lists, hook processors (type, enabled, params), blending lists.

Configs are loaded at startup; runtime updates via `update_stream_params` ([doc](../stream_parameter_updater.md)). Files: [`config.py`](../../../config.py).

## File Format (YAML Example)

```yaml
model_id: "runwayml/stable-diffusion-v1-5"
t_index_list: [0, 999]
width: 512
height: 512
mode: "img2img"
output_type: "pil"
device: "cuda"
dtype: "float16"
use_controlnet: true
controlnets:
- model_id: "lllyasviel/sd-controlnet-canny"
preprocessor: "canny"
conditioning_scale: 1.0
enabled: true
preprocessor_params:
threshold_low: 100
threshold_high: 200
use_ipadapter: true
ipadapters:
- ipadapter_model_path: "h94/IP-Adapter"
image_encoder_path: "openai/clip-vit-large-patch14"
scale: 0.8
type: "regular"
prompt_blending:
prompt
13 changes: 13 additions & 0 deletions src/streamdiffusion/docs/diagrams/hooks_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Hooks Integration

```mermaid
graph LR
A[Pipeline Stages] --> B[Embedding Hooks: Prompt Blending]
B --> C[UNet Hooks: ControlNet/IPAdapter]
C --> D[Orchestrator Calls: Processors]
D --> E[Latent/Image Hooks: Pre/Post Processing]

F[StreamParameterUpdater] -.->|Update Configs| C
G[Config] -->|Register Hooks| B
G -->|Register Hooks| C
G -->|Register Hooks| E
29 changes: 29 additions & 0 deletions src/streamdiffusion/docs/diagrams/module_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Module Integration

```mermaid
graph TD
A[Input Image] --> B[Image Preprocessing Hooks]
B --> C[VAE Encode]
C --> D[Latent Preprocessing Hooks]
D --> E[UNet Forward]

E --> F{ControlNet Active?}
F -->|Yes| G[Add Residuals: Down/Mid Blocks]
F -->|No| H{IPAdapter Active?}
H -->|Yes| I[Set IPAdapter Scale Vector]
H -->|No| J[Standard UNet Call]
G --> J
I --> J

J --> K[Latent Postprocessing Hooks]
K --> L[VAE Decode]
L --> M[Image Postprocessing Hooks]
M --> N[Output Image]

O[StreamParameterUpdater] -.->|Update Scales| I
P[Config] -->|Enable Modules| F
P -->|Enable Modules| H
P -->|Enable Modules| B
P -->|Enable Modules| D
P -->|Enable Modules| K
P -->|Enable Modules| M
72 changes: 72 additions & 0 deletions src/streamdiffusion/docs/diagrams/orchestrator_flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Orchestrator Flow

```mermaid
graph TB
subgraph "Input Layer - Distinct Preprocessing Types"
A["ControlNet/IPAdapter Inputs: Raw Images for Module Preprocessing"]
B["Pipeline Hooks: Latent/Image Tensors for Hook Stages"]
C["Postprocessing: VAE Output Images for Enhancement"]
end

subgraph "PreprocessingOrchestrator (ControlNet/IPAdapter - Intraframe Parallelism)"
D["Raw Images: Multiple ControlNets/IPAdapters"]
E["Group by Processor Type: e.g., All Canny Processors Grouped"]
F["Intraframe Parallel: ThreadPoolExecutor per Group"]
F --> G["Process Group in Parallel: e.g., Canny for CN1 and CN2 Simultaneously"]
G --> H["Merge/Broadcast Group Results to Specific Modules e.g. Canny to CN1 and CN2"]
I["Intraframe Sequential: Unique Processors Single Thread"]
H --> J["Cache by Type: Reuse Across Modules/Frames"]
I --> J
J --> K["Output Distinct Tensors for Each ControlNet/IPAdapter"]
end

subgraph "PipelinePreprocessingOrchestrator (Hook Stages - Sequential Chain)"
L["Latent/Image Tensors from Pipeline Hooks"]
M["Sequential Chain: _execute_pipeline_chain"]
M --> N["Single Processor Application: e.g., Latent Feedback Sequential"]
N --> O["Next Processor in Order (order attr)"]
O --> P["Chain Continues: No Parallelism Within Chain"]
P --> M
Q["Output Processed Tensor to Next Pipeline Hook/Stage"]
end

subgraph "PostprocessingOrchestrator (Output - Cached Sequential)"
R["VAE Decoded Images"]
S["Sequential with Cache Check: _apply_single_postprocessor"]
S --> T{"Cache Hit for Identical Input?"}
T -->|Yes| U["Reuse Cached: e.g., Same Upscale Params"]
T -->|No| V["Process Sequential: Realesrgan_trt then Sharpen"]
U --> W["Output Enhanced Image"]
V --> W
end

subgraph "BaseOrchestrator (All Types - Interframe Pipelining)"
X{"Use Sync Processing? (Feedback/Temporal Config)"}
X -->|Yes| Y["Process Sync: Sequential/Immediate (No Lag, Low Throughput)"]
X -->|No| Z["Background Thread: Pipelined/1-Frame Lag (High Throughput)"]
Y --> AA["Apply Current Frame Results"]
Z --> AA
AA --> BB["Output to Pipeline/Next Orchestrator/Stage"]
end

subgraph "Shared Resources & Integration"
CC["OrchestratorUser Mixin: Attach Shared Orchestrators to Modules/Hooks"]
DD["StreamParameterUpdater: Runtime Param Updates to Processors"]
EE["Thread Lock: Ensure Thread-Safe Parallel & Pipelined Execution"]
end

A --> E
B --> M
C --> S
E --> X
M --> X
S --> X
CC -.->|"Shared Orchestrators"| E
CC -.->|"Shared Orchestrators"| M
CC -.->|"Shared Orchestrators"| S
DD -.->|"Dynamic Params"| E
DD -.->|"Dynamic Params"| M
DD -.->|"Dynamic Params"| S
EE -.->|"Protect"| F
EE -.->|"Protect"| M
EE -.->|"Protect"| S
60 changes: 60 additions & 0 deletions src/streamdiffusion/docs/diagrams/overall_architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Overall Architecture

```mermaid
graph TB
subgraph "Input"
A["Input: Image/Prompt/Control Image"]
end

subgraph "Preprocessing"
B["Preprocessing Orchestrators"]
C["Processors: Edge Detection (Canny/HED), Pose (OpenPose), Depth (MiDaS)"]
D["Parallel Execution via ThreadPool"]
end

subgraph "Pipeline Core"
E["StreamDiffusion.prepare: Embeddings/Timesteps/Noise"]
F["UNet Steps with Hooks"]
G["ControlNet/IPAdapter Injection"]
H["Orchestrator Calls: Latent/Image Hooks"]
end

subgraph "Decoding"
I["VAE Decode"]
J["Postprocessing Orchestrators"]
end

subgraph "Output"
K["Output: Image"]
end

subgraph "Management"
L["StreamParameterUpdater: Blending/Caching"]
M["Config Loader: YAML/JSON"]
end

subgraph "Acceleration"
N["TensorRT Engines: UNet/VAE/ControlNet"]
O["Runtime Inference"]
end

A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> I
I --> J
J --> K

L -.->|"Updates"| E
L -.->|"Updates"| F
M -.->|"Setup"| B
M -.->|"Setup"| J
M -.->|"Setup"| L
N -.->|"Optimized"| F
N -.->|"Optimized"| I
O -.->|"Fallback PyTorch"| F
O -.->|"Fallback PyTorch"| I
56 changes: 56 additions & 0 deletions src/streamdiffusion/docs/diagrams/parameter_updating.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Parameter Updating

```mermaid
graph TD
subgraph "Runtime Update Entry Point"
A["update_stream_params Call"]
A --> B["Thread Lock: _update_lock"]
end

subgraph "Parameter Branches"
B --> C{"Prompt List Provided?"}
C -->|Yes| D["_cache_prompt_embeddings: Cache/Encode Prompts"]
C -->|No| E{"Seed List Provided?"}
E -->|Yes| F["_cache_seed_noise: Cache/Generate Noise"]
E -->|No| G{"ControlNet Config Provided?"}
G -->|Yes| H["Diff Current vs Desired: Add/Remove/Update Scales/Enabled"]
H --> I["Update ControlNet Pipeline: reorder/add/remove/update_scale"]
G -->|No| J{"IPAdapter Config Provided?"}
J -->|Yes| K["Update Scale: Uniform or Per-Layer Vector"]
K --> L["Set Weight Type: Linear/SLERP for Layers/Steps"]
J -->|No| M{"Hook Config Provided? e.g., Image/Latent Pre/Post"}
M -->|Yes| N["Diff Current vs Desired: Modify/Add/Remove Processors In-Place"]
N --> O["Update Processor Params/Enabled/Order"]
M -->|No| P["Update Timestep/Resolution: Recalc Scalings/Batches"]
end

subgraph "Blending & Caching Layer"
D --> Q["_apply_prompt_blending: Linear/SLERP"]
F --> R["_apply_seed_blending: Linear/SLERP"]
I --> S["Cache Stats: Hits/Misses for Monitoring"]
L --> S
O --> S
P --> S
Q --> T["Update Pipeline Tensors: prompt_embeds/init_noise"]
R --> T
S --> T
end

subgraph "Pipeline Integration"
T --> U["Pipeline Uses Updated Tensors/Hooks"]
end

subgraph "Shared Utilities"
V["Normalize Weights: Sum to 1.0 (Optional)"]
W["Thread-Safe Lock: Prevent Race Conditions"]
X["Cache Reindexing: Handle Add/Remove"]
end

C -.->|"Use"| V
E -.->|"Use"| V
B -.->|"Protect"| W
D -.->|"Use"| X
F -.->|"Use"| X
H -.->|"Use"| X
J -.->|"Use"| X
M -.->|"Use"| X
Loading