You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
├── uvicorn_diffu.py # convenience script to start uvicorn with recommended flags
24
23
```
25
24
26
-
27
25
## What `diffusers-async` adds / Why we needed it
28
26
29
27
Core problem: a naive server that calls `pipe.__call__` concurrently can hit **race conditions** (e.g., `scheduler.set_timesteps` mutates shared state) or explode memory by deep-copying the whole pipeline per-request.
@@ -32,7 +30,8 @@ Core problem: a naive server that calls `pipe.__call__` concurrently can hit **r
32
30
33
31
***Request-scoped views**: `RequestScopedPipeline` creates a shallow copy of the pipeline per request so heavy weights (UNet, VAE, text encoder) remain shared and *are not duplicated*.
34
32
***Per-request mutable state**: stateful small objects (scheduler, RNG state, small lists/dicts, callbacks) are cloned per request. Where available we call `scheduler.clone_for_request(...)`, otherwise we fallback to safe `deepcopy` or other heuristics.
35
-
***`retrieve_timesteps(..., return_scheduler=True)`**: retro-compatible helper that returns `(timesteps, num_inference_steps, scheduler)` without mutating the shared scheduler. This is the safe path for getting a scheduler configured per-request.
33
+
***Tokenizer concurrency safety**: `RequestScopedPipeline` now manages an internal tokenizer lock. This ensures that Rust tokenizers are safe to use under concurrency — race condition errors like `Already borrowed` no longer occur.
34
+
***`retrieve_timesteps(..., return_scheduler=True)`**: fully retro-compatible helper that returns `(timesteps, num_inference_steps, scheduler)` without mutating the shared scheduler. For users not using `return_scheduler=True`, the behavior is identical to the original API.
36
35
***Robust attribute handling**: wrapper avoids writing to read-only properties (e.g., `components`) and auto-detects small mutable attributes to clone while avoiding duplication of large tensors.
37
36
38
37
## How the server works (high-level flow)
@@ -51,7 +50,6 @@ Core problem: a naive server that calls `pipe.__call__` concurrently can hit **r
51
50
3.**Result**: inference completes, images are moved to CPU & saved (if requested), internal buffers freed (GC + `torch.cuda.empty_cache()`).
52
51
4. Multiple requests can run in parallel while sharing heavy weights and isolating mutable state.
53
52
54
-
55
53
## How to set up and run the server
56
54
57
55
### 1) Install dependencies
@@ -65,7 +63,7 @@ If using the `diffusers` fork via git, either:
*`Already borrowed` — tokenizers (Rust) error when used concurrently.
98
+
*`Already borrowed` — previously a Rust tokenizer concurrency error.
99
+
✅ This is now fixed: `RequestScopedPipeline` manages an internal tokenizer lock so race conditions no longer happen.
101
100
102
-
* Workarounds:
103
-
104
-
* Acquire a `Lock` around tokenization or around the pipeline call (serializes that part).
105
-
* Use the slow tokenizer (`converter_to_slow`) for concurrency tests.
106
-
* Patch only the tokenization method to use a lock instead of serializing entire forward.
107
101
*`can't set attribute 'components'` — pipeline exposes read-only `components`.
108
102
109
103
* The RequestScopedPipeline now detects read-only properties and skips setting them.
104
+
110
105
* Scheduler issues:
111
106
112
107
* If the scheduler doesn't implement `clone_for_request` and `deepcopy` fails, we log and fallback — but prefer `retrieve_timesteps(..., return_scheduler=True)` to avoid mutating the shared scheduler.
113
-
108
+
* ✅ Note: `retrieve_timesteps` is fully retro-compatible — if you don’t pass `return_scheduler=True`, the behavior is unchanged.
0 commit comments