Skip to content

Commit a9666b1

Browse files
Update examples/server-async/README.md for changes to tokenizer locks and backward-compatible retrieve_timesteps
1 parent 0f63f4d commit a9666b1

File tree

1 file changed

+10
-15
lines changed

1 file changed

+10
-15
lines changed

examples/server-async/README.md

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,14 @@
1414
All the components needed to create the inference server are in `DiffusersServer/`
1515

1616
```
17-
DiffusersServer/ # the example server package
18-
├── __init__.py
17+
DiffusersServer/
18+
├── **init**.py
1919
├── create_server.py # helper script to build/run the app programmatically
2020
├── Pipelines.py # pipeline loader classes (SD3, Flux, legacy SD, video)
21-
├── serverasync.py # FastAPI app factory (create_app_fastapi)
22-
├── superpipeline.py # optional custom pipeline glue code
21+
├── serverasync.py # FastAPI app factory (create\_app\_fastapi)
2322
├── uvicorn_diffu.py # convenience script to start uvicorn with recommended flags
2423
```
2524

26-
2725
## What `diffusers-async` adds / Why we needed it
2826

2927
Core problem: a naive server that calls `pipe.__call__` concurrently can hit **race conditions** (e.g., `scheduler.set_timesteps` mutates shared state) or explode memory by deep-copying the whole pipeline per-request.
@@ -32,7 +30,8 @@ Core problem: a naive server that calls `pipe.__call__` concurrently can hit **r
3230

3331
* **Request-scoped views**: `RequestScopedPipeline` creates a shallow copy of the pipeline per request so heavy weights (UNet, VAE, text encoder) remain shared and *are not duplicated*.
3432
* **Per-request mutable state**: stateful small objects (scheduler, RNG state, small lists/dicts, callbacks) are cloned per request. Where available we call `scheduler.clone_for_request(...)`, otherwise we fallback to safe `deepcopy` or other heuristics.
35-
* **`retrieve_timesteps(..., return_scheduler=True)`**: retro-compatible helper that returns `(timesteps, num_inference_steps, scheduler)` without mutating the shared scheduler. This is the safe path for getting a scheduler configured per-request.
33+
* **Tokenizer concurrency safety**: `RequestScopedPipeline` now manages an internal tokenizer lock. This ensures that Rust tokenizers are safe to use under concurrency — race condition errors like `Already borrowed` no longer occur.
34+
* **`retrieve_timesteps(..., return_scheduler=True)`**: fully retro-compatible helper that returns `(timesteps, num_inference_steps, scheduler)` without mutating the shared scheduler. For users not using `return_scheduler=True`, the behavior is identical to the original API.
3635
* **Robust attribute handling**: wrapper avoids writing to read-only properties (e.g., `components`) and auto-detects small mutable attributes to clone while avoiding duplication of large tensors.
3736

3837
## How the server works (high-level flow)
@@ -51,7 +50,6 @@ Core problem: a naive server that calls `pipe.__call__` concurrently can hit **r
5150
3. **Result**: inference completes, images are moved to CPU & saved (if requested), internal buffers freed (GC + `torch.cuda.empty_cache()`).
5251
4. Multiple requests can run in parallel while sharing heavy weights and isolating mutable state.
5352

54-
5553
## How to set up and run the server
5654

5755
### 1) Install dependencies
@@ -65,7 +63,7 @@ If using the `diffusers` fork via git, either:
6563
```bash
6664
pip install "git+https://github.com/F4k3r22/diffusers-async.git@main"
6765
pip install -r requirements.txt
68-
```
66+
````
6967

7068
### 2) Start the server
7169

@@ -97,17 +95,14 @@ Response example:
9795

9896
## Troubleshooting (quick)
9997

100-
* `Already borrowed` — tokenizers (Rust) error when used concurrently.
98+
* `Already borrowed` — previously a Rust tokenizer concurrency error.
99+
✅ This is now fixed: `RequestScopedPipeline` manages an internal tokenizer lock so race conditions no longer happen.
101100

102-
* Workarounds:
103-
104-
* Acquire a `Lock` around tokenization or around the pipeline call (serializes that part).
105-
* Use the slow tokenizer (`converter_to_slow`) for concurrency tests.
106-
* Patch only the tokenization method to use a lock instead of serializing entire forward.
107101
* `can't set attribute 'components'` — pipeline exposes read-only `components`.
108102
109103
* The RequestScopedPipeline now detects read-only properties and skips setting them.
104+
110105
* Scheduler issues:
111106
112107
* If the scheduler doesn't implement `clone_for_request` and `deepcopy` fails, we log and fallback — but prefer `retrieve_timesteps(..., return_scheduler=True)` to avoid mutating the shared scheduler.
113-
108+
* ✅ Note: `retrieve_timesteps` is fully retro-compatible — if you don’t pass `return_scheduler=True`, the behavior is unchanged.

0 commit comments

Comments
 (0)