Skip to content

feat: added example FastAPI-based inference server for Qwen-ASR#31

Open
kyr0 wants to merge 1 commit intoQwenLM:mainfrom
kyr0:feat/inference-server-example
Open

feat: added example FastAPI-based inference server for Qwen-ASR#31
kyr0 wants to merge 1 commit intoQwenLM:mainfrom
kyr0:feat/inference-server-example

Conversation

@kyr0
Copy link

@kyr0 kyr0 commented Jan 30, 2026

Addressing #15 and a few other questions, I've implemented, tested and benchmarked Qwen-ASR on a 1x NVIDIA H200 NVL throughly, coming up with this inference server implementation that is both simple and fully-featured. This might serve as a boilerplate for more decent implementations -- I believe this is quite a good sweet-spot right now. It does scale well under load, it is configurable, yet still easy to understand. I've also implemented readiness and simple monitoring/SRE features. Forced Aligner is supported as well -- every feature documented in the examples folder should be addressable with this easily. Also, the server.py is volume mounted; so you don't need to rebuild the container on app code change -- another DX improvement. The container also loads the model in HF_HOME of the host. Last but not least, I've provided local and remote reference audios that were generated with ... Qwen-TTS :)

I hope this will reduce the load of issues opened because of confusion.

Requirements: NVIDIA Container Toolkit should be installed on the host (!).

@RomiVu
Copy link

RomiVu commented Feb 3, 2026

AI-generated garbage

@kyr0
Copy link
Author

kyr0 commented Feb 8, 2026

@RomiVu Have you even tried it? You have 2 contributions this year and you react like this on a working solution that has already gathered a few stars? I really wonder how bitter you must feel.. https://github.com/kyr0/fast-qwen-asr-inference-vllm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants