Offline Model Loading When Previously Downloaded via llama-server fails

##  Issue Description:

When attempting to launch llama-server without an internet connection, it fails with a timeout error while making a GET request to Hugging Face, despite the model having been previously downloaded using the same command.

## Steps to Reproduce:

1.	Run the following command while connected to the internet to download and use the model:

```sh
llama-server \
    -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \
    --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
    --ctx-size 0 --cache-reuse 256
```

2.	Disconnect from the internet.
3.	Run the same command again.
4.	The command fails, appearing to attempt a GET request to Hugging Face.
Using the llama-server command without internet doesn't work even though the model was already downloaded previously using the same command. The command seems to timeout on a GET request to hugging face.


## Expected Behavior:

If the model was previously downloaded, llama-server should detect and load it from cache without requiring an internet connection.

## Actual Behavior:

The command times out while trying to reach Hugging Face, preventing offline usage.

Would appreciate any guidance on how to enforce offline usage or a workaround to bypass this issue! 🚀

If this is not a solved issue and you would like me to open the issue in the llama.cpp instead, let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Offline Model Loading When Previously Downloaded via llama-server fails #23

Issue Description:

Steps to Reproduce:

Expected Behavior:

Actual Behavior:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Offline Model Loading When Previously Downloaded via llama-server fails #23

Description

Issue Description:

Steps to Reproduce:

Expected Behavior:

Actual Behavior:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions