Skip to content

Offline Model Loading When Previously Downloaded via llama-server failsΒ #23

@1-ashraful-islam

Description

@1-ashraful-islam

Issue Description:

When attempting to launch llama-server without an internet connection, it fails with a timeout error while making a GET request to Hugging Face, despite the model having been previously downloaded using the same command.

Steps to Reproduce:

  1. Run the following command while connected to the internet to download and use the model:
llama-server \
    -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \
    --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
    --ctx-size 0 --cache-reuse 256
  1. Disconnect from the internet.
  2. Run the same command again.
  3. The command fails, appearing to attempt a GET request to Hugging Face.
    Using the llama-server command without internet doesn't work even though the model was already downloaded previously using the same command. The command seems to timeout on a GET request to hugging face.

Expected Behavior:

If the model was previously downloaded, llama-server should detect and load it from cache without requiring an internet connection.

Actual Behavior:

The command times out while trying to reach Hugging Face, preventing offline usage.

Would appreciate any guidance on how to enforce offline usage or a workaround to bypass this issue! πŸš€

If this is not a solved issue and you would like me to open the issue in the llama.cpp instead, let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions