Skip to content

Commit 6b69367

Browse files
Update examples/server-async/README.md
1 parent edd550b commit 6b69367

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

examples/server-async/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Asynchronous server and parallel execution of models
22

3-
> Example/demo server that keeps a single model in memory while safely running parallel inference requests by creating per-request lightweight views and cloning only small, stateful components (schedulers, RNG state, small mutable attrs). Works with StableDiffusion3/Flux pipelines.
3+
> Example/demo server that keeps a single model in memory while safely running parallel inference requests by creating per-request lightweight views and cloning only small, stateful components (schedulers, RNG state, small mutable attrs). Works with StableDiffusion3 pipelines.
44
> We recommend running 10 to 50 inferences in parallel for optimal performance, averaging between 25 and 30 seconds to 1 minute and 1 minute and 30 seconds. (This is only recommended if you have a GPU with 35GB of VRAM or more; otherwise, keep it to one or two inferences in parallel to avoid decoding or saving errors due to memory shortages.)
55
66
## ⚠️ IMPORTANT
77

8-
* The example demonstrates how to run pipelines like `StableDiffusion3-3.5` and `Flux.1` concurrently while keeping a single copy of the heavy model parameters on GPU.
8+
* The example demonstrates how to run pipelines like `StableDiffusion3-3.5` concurrently while keeping a single copy of the heavy model parameters on GPU.
99

1010
## Necessary components
1111

@@ -18,7 +18,7 @@ server-async/
1818
├─────── scheduler.py # BaseAsyncScheduler wrapper and async_retrieve_timesteps for secure inferences
1919
├─────── requestscopedpipeline.py # RequestScoped Pipeline for inference with a single in-memory model
2020
├─────── utils.py # Image/video saving utilities and service configuration
21-
├── Pipelines.py # pipeline loader classes (SD3, Flux, legacy SD, video)
21+
├── Pipelines.py # pipeline loader classes (SD3)
2222
├── serverasync.py # FastAPI app with lifespan management and async inference endpoints
2323
├── test.py # Client test script for inference requests
2424
├── requirements.txt # Dependencies

0 commit comments

Comments
 (0)