Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 29, 2025

Motivation

Fix #13128

My resident is currently having no internet (temporary), and I'm using a slow 4G to upload this PR.

This PR allows using -hf and -mu without internet access, given you already downloaded the model.

If the model is not yet download, or the manifest file is not yet generated (which does not exist before this PR), then you will see this error:

error: failed to get manifest: error: cannot make GET request: Couldn't resolve host name
try reading from cache
error: failed to get manifest (check your internet connection)

Behavior change

2 noticeable things:

  • HEAD request now doesn't allow retry. This is because if we force the user to wait for 3 retries, it will be a bad UX for offline usage. Not sure if this will impact anyone, but I hope this will be a big problem (see next point)
  • If HEAD request fails, but the file does exist, we won't re-download it. The argument is that if the server does not support ETag on HEAD request, there is no point of forcing user to re-download the file every time.

Idea for the future

While making this PR, I intentionally add a manifest= prefix to the cached manifest file.

In the future, we can have a flag like --list-cached-models to show the list of cached models that user can use.

In a far future, we can also allow llama-server to swap models (not necessarily running 2 or more in parallel). Think of it like the use case of LM Studio where you can load 1 model at a time. The manifest file provided by this PR can allow listing available models in cache, ready to be loaded.

@ngxson ngxson requested a review from ggerganov April 29, 2025 22:40
@ngxson ngxson merged commit 5933e6f into ggml-org:master Apr 30, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Allow -hf to be used offline

2 participants