- 
        Couldn't load subscription status. 
- Fork 6.5k
Add server example #9918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            stevhliu
  merged 15 commits into
  huggingface:main
from
thealmightygrant:add-server-example
  
      
      
   
  Nov 18, 2024 
      
    
  
     Merged
                    Add server example #9918
Changes from 10 commits
      Commits
    
    
            Show all changes
          
          
            15 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      b2d1c06
              
                Add server example.
              
              
                thealmightygrant 040806c
              
                Minor updates to README.
              
              
                thealmightygrant a91feec
              
                Add fixes after local testing.
              
              
                thealmightygrant be39807
              
                Apply suggestions from code review
              
              
                thealmightygrant a47bae3
              
                Merge branch 'huggingface:main' into add-server-example
              
              
                thealmightygrant 3bf2c49
              
                More doc updates.
              
              
                thealmightygrant 4f8ba17
              
                Maybe this will work to build the docs correctly?
              
              
                thealmightygrant 82ab997
              
                Fix style issues.
              
              
                thealmightygrant 71f3638
              
                Fix toc.
              
              
                thealmightygrant 36948da
              
                Minor reformatting.
              
              
                thealmightygrant c4733db
              
                Move docs to proper loc.
              
              
                thealmightygrant 468bec9
              
                Fix missing tick.
              
              
                thealmightygrant a0bf884
              
                Apply suggestions from code review
              
              
                thealmightygrant 0c11101
              
                Sync docs changes back to README.
              
              
                thealmightygrant 31711a7
              
                Very minor update to docs to add space.
              
              
                thealmightygrant File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Create a server | ||
|  | ||
| Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time. | ||
|  | ||
| This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want. | ||
|  | ||
|  | ||
| Start by navigating to the `examples/server` folder and installing all of the dependencies. | ||
|  | ||
| ``py | ||
|         
                  thealmightygrant marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| pip install . | ||
| pip install -f requirements.txt | ||
| ``` | ||
| Launch the server with the following command. | ||
| ```py | ||
| python server.py | ||
| ``` | ||
|  | ||
| The server is accessed at http://localhost:8000. You can curl this model with the following command. | ||
| ``` | ||
| curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations | ||
| ``` | ||
|  | ||
| If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command. | ||
|  | ||
| ``` | ||
| uv pip compile requirements.in -o requirements.txt | ||
| ``` | ||
|  | ||
| ## How does this Server Work? | ||
|         
                  thealmightygrant marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
|  | ||
| The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is defined like this: | ||
|         
                  thealmightygrant marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| ```py | ||
| @app.post("/v1/images/generations") | ||
| async def generate_image(image_input: TextToImageInput): | ||
| try: | ||
| loop = asyncio.get_event_loop() | ||
| scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config) | ||
| pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler) | ||
| generator =torch.Generator(device="cuda") | ||
| generator.manual_seed(random.randint(0, 10000000)) | ||
| output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator)) | ||
| logger.info(f"output: {output}") | ||
| image_url = save_image(output.images[0]) | ||
| return {"data": [{"url": image_url}]} | ||
| except Exception as e: | ||
| if isinstance(e, HTTPException): | ||
| raise e | ||
| elif hasattr(e, 'message'): | ||
| raise HTTPException(status_code=500, detail=e.message + traceback.format_exc()) | ||
| raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc()) | ||
| ``` | ||
| Above, the `generate_image` function is defined as asynchronous with the `async` keyword so that [FastAPI](https://fastapi.tiangolo.com/async/) knows that whatever is happening in this function is not going to necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. For us, this happens when it hits this part of the function: | ||
|         
                  thealmightygrant marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| ```py | ||
| output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator)) | ||
| ``` | ||
| At this point, we are tossing the execution of the pipeline function [onto a new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) and the main thread knows to go and do some other things until a result is returned from the `pipeline`. | ||
|         
                  thealmightygrant marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
|  | ||
| Another important aspect of this implementation is the portion which creates a Pipeline from the `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once into the GPU while still allowing for each new request that is running on its own thread to have its own generator and scheduler. The scheduler in particular, at the time of this writing (November 2024), is not thread safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you do try to use the same scheduler across multiple threads. | ||
|         
                  thealmightygrant marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
|  | ||
| # Create a server | ||
|  | ||
| Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time. | ||
|  | ||
| This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want. | ||
|  | ||
|  | ||
| Start by navigating to the `examples/server` folder and installing all of the dependencies. | ||
|  | ||
| ``py | ||
| pip install . | ||
| pip install -f requirements.txt | ||
| ``` | ||
|  | ||
| Launch the server with the following command. | ||
|  | ||
| ```py | ||
| python server.py | ||
| ``` | ||
|  | ||
| The server is accessed at http://localhost:8000. You can curl this model with the following command. | ||
| ``` | ||
| curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations | ||
| ``` | ||
|  | ||
| If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command. | ||
|  | ||
| ``` | ||
| uv pip compile requirements.in -o requirements.txt | ||
| ``` | ||
|  | ||
| ## How does this Server Work? | ||
|  | ||
| The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is defined like this: | ||
| ```py | ||
| @app.post("/v1/images/generations") | ||
| async def generate_image(image_input: TextToImageInput): | ||
| try: | ||
| loop = asyncio.get_event_loop() | ||
| scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config) | ||
| pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler) | ||
| generator =torch.Generator(device="cuda") | ||
| generator.manual_seed(random.randint(0, 10000000)) | ||
| output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator)) | ||
| logger.info(f"output: {output}") | ||
| image_url = save_image(output.images[0]) | ||
| return {"data": [{"url": image_url}]} | ||
| except Exception as e: | ||
| if isinstance(e, HTTPException): | ||
| raise e | ||
| elif hasattr(e, 'message'): | ||
| raise HTTPException(status_code=500, detail=e.message + traceback.format_exc()) | ||
| raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc()) | ||
| ``` | ||
| Above, the `generate_image` function is defined as asynchronous with the `async` keyword so that [FastAPI](https://fastapi.tiangolo.com/async/) knows that whatever is happening in this function is not going to necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. For us, this happens when it hits this part of the function: | ||
| ```py | ||
| output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator)) | ||
| ``` | ||
| At this point, we are tossing the execution of the pipeline function [onto a new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) and the main thread knows to go and do some other things until a result is returned from the `pipeline`. | ||
|  | ||
| Another important aspect of this implementation is the portion which creates a Pipeline from the `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once into the GPU while still allowing for each new request that is running on its own thread to have its own generator and scheduler. The scheduler in particular, at the time of this writing (November 2024), is not thread safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you do try to use the same scheduler across multiple threads. | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| torch~=2.4.0 | ||
| transformers==4.46.1 | ||
| sentencepiece | ||
| aiohttp | ||
| py-consul | ||
| prometheus_client >= 0.18.0 | ||
| prometheus-fastapi-instrumentator >= 7.0.0 | ||
| fastapi | ||
| uvicorn | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # This file was autogenerated by uv via the following command: | ||
| # uv pip compile requirements.in -o requirements.txt | ||
| aiohappyeyeballs==2.4.3 | ||
| # via aiohttp | ||
| aiohttp==3.10.10 | ||
| # via -r requirements.in | ||
| aiosignal==1.3.1 | ||
| # via aiohttp | ||
| annotated-types==0.7.0 | ||
| # via pydantic | ||
| anyio==4.6.2.post1 | ||
| # via starlette | ||
| attrs==24.2.0 | ||
| # via aiohttp | ||
| certifi==2024.8.30 | ||
| # via requests | ||
| charset-normalizer==3.4.0 | ||
| # via requests | ||
| click==8.1.7 | ||
| # via uvicorn | ||
| fastapi==0.115.3 | ||
| # via -r requirements.in | ||
| filelock==3.16.1 | ||
| # via | ||
| # huggingface-hub | ||
| # torch | ||
| # transformers | ||
| frozenlist==1.5.0 | ||
| # via | ||
| # aiohttp | ||
| # aiosignal | ||
| fsspec==2024.10.0 | ||
| # via | ||
| # huggingface-hub | ||
| # torch | ||
| h11==0.14.0 | ||
| # via uvicorn | ||
| huggingface-hub==0.26.1 | ||
| # via | ||
| # tokenizers | ||
| # transformers | ||
| idna==3.10 | ||
| # via | ||
| # anyio | ||
| # requests | ||
| # yarl | ||
| jinja2==3.1.4 | ||
| # via torch | ||
| markupsafe==3.0.2 | ||
| # via jinja2 | ||
| mpmath==1.3.0 | ||
| # via sympy | ||
| multidict==6.1.0 | ||
| # via | ||
| # aiohttp | ||
| # yarl | ||
| networkx==3.4.2 | ||
| # via torch | ||
| numpy==2.1.2 | ||
| # via transformers | ||
| packaging==24.1 | ||
| # via | ||
| # huggingface-hub | ||
| # transformers | ||
| prometheus-client==0.21.0 | ||
| # via | ||
| # -r requirements.in | ||
| # prometheus-fastapi-instrumentator | ||
| prometheus-fastapi-instrumentator==7.0.0 | ||
| # via -r requirements.in | ||
| propcache==0.2.0 | ||
| # via yarl | ||
| py-consul==1.5.3 | ||
| # via -r requirements.in | ||
| pydantic==2.9.2 | ||
| # via fastapi | ||
| pydantic-core==2.23.4 | ||
| # via pydantic | ||
| pyyaml==6.0.2 | ||
| # via | ||
| # huggingface-hub | ||
| # transformers | ||
| regex==2024.9.11 | ||
| # via transformers | ||
| requests==2.32.3 | ||
| # via | ||
| # huggingface-hub | ||
| # py-consul | ||
| # transformers | ||
| safetensors==0.4.5 | ||
| # via transformers | ||
| sentencepiece==0.2.0 | ||
| # via -r requirements.in | ||
| sniffio==1.3.1 | ||
| # via anyio | ||
| starlette==0.41.0 | ||
| # via | ||
| # fastapi | ||
| # prometheus-fastapi-instrumentator | ||
| sympy==1.13.3 | ||
| # via torch | ||
| tokenizers==0.20.1 | ||
| # via transformers | ||
| torch==2.4.1 | ||
| # via -r requirements.in | ||
| tqdm==4.66.5 | ||
| # via | ||
| # huggingface-hub | ||
| # transformers | ||
| transformers==4.46.1 | ||
| # via -r requirements.in | ||
| typing-extensions==4.12.2 | ||
| # via | ||
| # fastapi | ||
| # huggingface-hub | ||
| # pydantic | ||
| # pydantic-core | ||
| # torch | ||
| urllib3==2.2.3 | ||
| # via requests | ||
| uvicorn==0.32.0 | ||
| # via -r requirements.in | ||
| yarl==1.16.0 | ||
| # via aiohttp | 
      
      Oops, something went wrong.
        
    
  
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.