Replies: 1 comment
-
|
I solved the problem, there was an difference in the model name when running locally vs hugging face. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I try to start an OpenAI compatible server using the command:
sudo docker run --runtime nvidia --gpus all -v /root/.cache/huggingface -p 18888:18888 vllm/vllm-openai --model TheBloke/openchat-3.5-0106-AWQ --host 0.0.0.0 --enforce-eager --port 18888But when I try to make a request, I get the error:
"POST /v1/chat/completions HTTP/1.1" 404 Not FoundDoing
wget localhost:18888/v1/modelsworks and I get:"GET /v1/models HTTP/1.1" 200 OKIf I run
/usr/bin/python3 -m ochat.serving.openai_api_server --model TheBloke/openchat-3.5-0106-AWQ --host 0.0.0.0The requests works. I wonder if I'm making mistake in how I use docker and whether the OpenAI endpoint is being set up or some other non-compatible API?
Beta Was this translation helpful? Give feedback.
All reactions