|  | 
|  | 1 | + | 
|  | 2 | +# Create a server | 
|  | 3 | + | 
|  | 4 | +Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time. | 
|  | 5 | + | 
|  | 6 | +This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want. | 
|  | 7 | + | 
|  | 8 | + | 
|  | 9 | +Start by navigating to the `examples/server` folder and installing all of the dependencies. | 
|  | 10 | + | 
|  | 11 | +```py | 
|  | 12 | +pip install . | 
|  | 13 | +pip install -f requirements.txt | 
|  | 14 | +``` | 
|  | 15 | + | 
|  | 16 | +Launch the server with the following command. | 
|  | 17 | + | 
|  | 18 | +```py | 
|  | 19 | +python server.py | 
|  | 20 | +``` | 
|  | 21 | + | 
|  | 22 | +The server is accessed at http://localhost:8000. You can curl this model with the following command. | 
|  | 23 | +``` | 
|  | 24 | +curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations | 
|  | 25 | +``` | 
|  | 26 | + | 
|  | 27 | +If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command. | 
|  | 28 | + | 
|  | 29 | +``` | 
|  | 30 | +uv pip compile requirements.in -o requirements.txt | 
|  | 31 | +``` | 
|  | 32 | + | 
|  | 33 | + | 
|  | 34 | +The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is shown below. | 
|  | 35 | +```py | 
|  | 36 | +@app.post("/v1/images/generations") | 
|  | 37 | +async def generate_image(image_input: TextToImageInput): | 
|  | 38 | +    try: | 
|  | 39 | +        loop = asyncio.get_event_loop() | 
|  | 40 | +        scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config) | 
|  | 41 | +        pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler) | 
|  | 42 | +        generator = torch.Generator(device="cuda") | 
|  | 43 | +        generator.manual_seed(random.randint(0, 10000000)) | 
|  | 44 | +        output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator)) | 
|  | 45 | +        logger.info(f"output: {output}") | 
|  | 46 | +        image_url = save_image(output.images[0]) | 
|  | 47 | +        return {"data": [{"url": image_url}]} | 
|  | 48 | +    except Exception as e: | 
|  | 49 | +        if isinstance(e, HTTPException): | 
|  | 50 | +            raise e | 
|  | 51 | +        elif hasattr(e, 'message'): | 
|  | 52 | +            raise HTTPException(status_code=500, detail=e.message + traceback.format_exc()) | 
|  | 53 | +        raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc()) | 
|  | 54 | +``` | 
|  | 55 | +The `generate_image` function is defined as asynchronous with the [async](https://fastapi.tiangolo.com/async/) keyword so that FastAPI knows that whatever is happening in this function won't necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. This is shown in the code below with the [await](https://fastapi.tiangolo.com/async/#async-and-await) keyword. | 
|  | 56 | +```py | 
|  | 57 | +output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator)) | 
|  | 58 | +``` | 
|  | 59 | +At this point, the execution of the pipeline function is placed onto a [new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor), and the main thread performs other things until a result is returned from the `pipeline`. | 
|  | 60 | + | 
|  | 61 | +Another important aspect of this implementation is creating a `pipeline` from `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once onto the GPU while still allowing for each new request that is running on a separate thread to have its own generator and scheduler. The scheduler, in particular, is not thread-safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you try to use the same scheduler across multiple threads. | 
0 commit comments