Qwen Model Serving Engine

This project serves as a lightweight LLM serving engine for Qwen models, using FastAPI and PyTorch. It supports continuous batching and paged attention for efficient and scalable inference.

Features

FastAPI for HTTP serving
Streaming and non-streaming responses
Continuous batching
Paged attention

Requirements

Python 3.8+

Installation

Clone the repository:

git clone <repository-url>
cd qwen-serving-engine

Install the dependencies:
```
pip install -r requirements.txt
```

Running the Server

Start the server using Uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --log-level info

API Endpoints

`POST /generate`

Generate text based on a given prompt.

Request: GenerationRequest
Response: GenerationResponse

`POST /chat/completions`

OpenAI-compatible endpoint for chat completions.

Request: ChatCompletionRequest
Response: ChatCompletionResponse

Example Request

Here's an example using curl to make a request to the /generate endpoint:

curl -X POST "http://localhost:8000/generate" \
    -H "Content-Type: application/json" \
    -d '{"prompt": "Hello, how are you?", "max_tokens": 50, "temperature": 0.7}'

Testing

Run the tests using pytest to ensure everything works correctly:

pytest tests/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
tests		tests
Dockerfile		Dockerfile
README.md		README.md
batching.py		batching.py
config.py		config.py
docker-compose.yml		docker-compose.yml
main.py		main.py
paged_attention.py		paged_attention.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen Model Serving Engine

Features

Requirements

Installation

Running the Server

API Endpoints

`POST /generate`

`POST /chat/completions`

Example Request

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

hsliuustc/qwen-serving-engine

Folders and files

Latest commit

History

Repository files navigation

Qwen Model Serving Engine

Features

Requirements

Installation

Running the Server

API Endpoints

POST /generate

POST /chat/completions

Example Request

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /generate`

`POST /chat/completions`

Packages