Update examples/server-async/README.md

FredyRivera-dev · FredyRivera-dev · commit ac5c9e6d3a6c · 2025-09-16T19:48:26.000-06:00
diff --git a/examples/server-async/Pipelines.py b/examples/server-async/Pipelines.py
@@ -1,4 +1,3 @@
-# Pipelines.py
 from diffusers.pipelines.stable_diffusion_3.pipeline_stable_diffusion_3 import StableDiffusion3Pipeline
 from diffusers.pipelines.flux.pipeline_flux import FluxPipeline
 import torch
@@ -102,8 +101,6 @@ def initialize_pipeline(self):
             self.model_type = "SD3_5"
         elif self.model in preset_models.Flux:
             self.model_type = "Flux"
-        else:
-            self.model_type = "SD"
 
         # Create appropriate pipeline based on model type and type_models
         if self.type_models == 't2im':
diff --git a/examples/server-async/README.md b/examples/server-async/README.md
@@ -5,24 +5,24 @@
 
 ## ⚠️ IMPORTANT
 
-* The server and inference harness live in this repo: `https://github.com/F4k3r22/DiffusersServer`.
-  The example demonstrates how to run pipelines like `StableDiffusion3-3.5` and `Flux.1` concurrently while keeping a single copy of the heavy model parameters on GPU.
+* The example demonstrates how to run pipelines like `StableDiffusion3-3.5` and `Flux.1` concurrently while keeping a single copy of the heavy model parameters on GPU.
 
 ## Necessary components
 
-All the components needed to create the inference server are in `DiffusersServer/`
+All the components needed to create the inference server are in the current directory:
 
 ```
-DiffusersServer/
+server-async/
 ├── utils/
 ├─────── __init__.py
-├─────── scheduler.py # BaseAsyncScheduler wrapper and async_retrieve_timesteps for secure inferences
-├─────── requestscopedpipeline.py # RequestScoped Pipeline for inference with a single in-memory model
-├── __init__.py
-├── create_server.py             # helper script to build/run the app programmatically
-├── Pipelines.py                 # pipeline loader classes (SD3, Flux, legacy SD, video)
-├── serverasync.py               # FastAPI app factory (create\_app\_fastapi)
-├── uvicorn_diffu.py             # convenience script to start uvicorn with recommended flags
+├─────── scheduler.py              # BaseAsyncScheduler wrapper and async_retrieve_timesteps for secure inferences
+├─────── requestscopedpipeline.py  # RequestScoped Pipeline for inference with a single in-memory model
+├─────── utils.py                  # Image/video saving utilities and service configuration
+├── Pipelines.py                   # pipeline loader classes (SD3, Flux, legacy SD, video)
+├── serverasync.py                 # FastAPI app with lifespan management and async inference endpoints
+├── test.py                        # Client test script for inference requests
+├── requirements.txt               # Dependencies
+└── README.md                      # This documentation
 ```
 
 ## What `diffusers-async` adds / Why we needed it
@@ -69,13 +69,28 @@ pip install -r requirements.txt
 
 ### 2) Start the server
 
-Using the `server.py` file that already has everything you need:
+Using the `serverasync.py` file that already has everything you need:
 
 ```bash
-python server.py
+python serverasync.py
 ```
 
-### 3) Example request
+The server will start on `http://localhost:8500` by default with the following features:
+- FastAPI application with async lifespan management
+- Automatic model loading and pipeline initialization
+- Request counting and active inference tracking
+- Memory cleanup after each inference
+- CORS middleware for cross-origin requests
+
+### 3) Test the server
+
+Use the included test script:
+
+```bash
+python test.py
+```
+
+Or send a manual request:
 
 `POST /api/diffusers/inference` with JSON body:
 
@@ -95,6 +110,13 @@ Response example:
 }
 ```
 
+### 4) Server endpoints
+
+- `GET /` - Welcome message
+- `POST /api/diffusers/inference` - Main inference endpoint
+- `GET /images/{filename}` - Serve generated images
+- `GET /api/status` - Server status and memory info
+
 ## Advanced Configuration
 
 ### RequestScopedPipeline Parameters
@@ -117,6 +139,19 @@ RequestScopedPipeline(
 * Enhanced debugging with `__repr__` and `__str__` methods
 * Full compatibility with existing scheduler APIs
 
+### Server Configuration
+
+The server configuration can be modified in `serverasync.py` through the `ServerConfigModels` dataclass:
+
+```python
+@dataclass
+class ServerConfigModels:
+    model: str = 'stabilityai/stable-diffusion-3-medium'  
+    type_models: str = 't2im'  
+    host: str = '0.0.0.0' 
+    port: int = 8500
+```
+
 ## Troubleshooting (quick)
 
 * `Already borrowed` — previously a Rust tokenizer concurrency error.
diff --git a/examples/server-async/serverasync.py b/examples/server-async/serverasync.py
@@ -221,3 +221,8 @@ async def get_status():
     allow_methods=["*"],
     allow_headers=["*"],
 )
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run(app, host=server_config.host, port=server_config.port)

Original file line number	Diff line number	Diff line change
`@@ -221,3 +221,8 @@ async def get_status():`
`221`	`221`	`allow_methods=["*"],`
`222`	`222`	`allow_headers=["*"],`
`223`	`223`	`)`
	`224`	`+`
	`225`	`+if __name__ == "__main__":`
	`226`	`+ import uvicorn`
	`227`	`+`
	`228`	`+ uvicorn.run(app, host=server_config.host, port=server_config.port)`