-
Couldn't load subscription status.
- Fork 82
Description
Issue Description:
When attempting to launch llama-server without an internet connection, it fails with a timeout error while making a GET request to Hugging Face, despite the model having been previously downloaded using the same command.
Steps to Reproduce:
- Run the following command while connected to the internet to download and use the model:
llama-server \
-hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \
--port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
--ctx-size 0 --cache-reuse 256- Disconnect from the internet.
- Run the same command again.
- The command fails, appearing to attempt a GET request to Hugging Face.
Using the llama-server command without internet doesn't work even though the model was already downloaded previously using the same command. The command seems to timeout on a GET request to hugging face.
Expected Behavior:
If the model was previously downloaded, llama-server should detect and load it from cache without requiring an internet connection.
Actual Behavior:
The command times out while trying to reach Hugging Face, preventing offline usage.
Would appreciate any guidance on how to enforce offline usage or a workaround to bypass this issue! π
If this is not a solved issue and you would like me to open the issue in the llama.cpp instead, let me know.