- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
Add resumable downloads for llama-server model loading #15963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
      
          
      
      
            ericcurtin
  
      
      
      commented
        Sep 13, 2025 
      
    
  
- Implement resumable downloads in common_download_file_single function
 - Add detection of partial download files (.downloadInProgress)
 - Check server support for HTTP Range requests via Accept-Ranges header
 - Implement HTTP Range request with "bytes=-" header
 - Open files in append mode when resuming vs create mode for new downloads
 
| 
           @ngxson @ggerganov PTAL  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements resumable downloads for llama-server model loading by adding support for HTTP Range requests when downloading model files. The implementation detects partial downloads and resumes them when servers support range requests.
Key changes:
- Added HTTP Range request support with server capability detection via Accept-Ranges header
 - Implemented partial download detection using 
.downloadInProgresstemporary files - Modified file handling to use append mode for resumable downloads vs create mode for new downloads
 
02492e9    to
    22e86f9      
    Compare
  
    | 
           macOS x86 build still flakey, seems like it's random luck based on which build server you get, sometimes it has the right versions of things to compile the code.  | 
    
2e485e5    to
    10b789d      
    Compare
  
    10b789d    to
    6749867      
    Compare
  
    | 
           We currently check the ETag header for this, as most of the time (if not
all?) ETag is the hash of the remote file to be downloaded. 
       | 
    
          
 Sure, but we don't check between retries, there's a minor chance the content changes in between retries. It probably wasn't a big deal before, you pull a full file or you don't. But with resumable transfers it becomes more relevant, because you could have half one version of the file, half a different version of the file.  | 
    
2e1b3f8    to
    8da2f1f      
    Compare
  
    | 
           
 Sure, but we don't check between retries, there's a minor chance the
 content changes in between retries. It probably wasn't a big deal before,
 you pull a full file or you don't. But with resumable transfers it becomes
 more relevant, because you could have half one version of the file, half a
 different version of the file. 
we currently don't check between retries, but you can implement it. my
idea is that we can rely on ETag instead of Last-Modified.
the overall idea is: the first time the file is downloaded, ETag header is
pulled via the HEAD request and it should be stored somewhere. then, one of
3 cases may happen:
- file download is completed --> the next time user run the model, the
stored ETag is used in to verify if the file is up-to-date
- file download is failed --> check ETag in the next retry. I think this is
not yet implemented, so you can try adding this
- file download is half-completed --> when resume, the ETag is used to make
sure remote file content isn't changed (this case is not yet implemented
and should be added in the current PR) 
       | 
    
          
 SGTM... You probably noticed that we now write the .json immediately also in this PR... Whereas before this write was at the end... We need to write it first now so we can identify what was downloaded last time  | 
    
68f95b3    to
    5bd47a7      
    Compare
  
    becdb99    to
    3011a70      
    Compare
  
    | 
           @rgerganov @slaren PTAL  | 
    
| 
           @am17an @JohannesGaessler PTAL  | 
    
3011a70    to
    49692ce      
    Compare
  
    | 
           All done @ngxson ready for re-review  | 
    
b319db9    to
    4e382e8      
    Compare
  
    | 
           A windows flake, that's rare:  | 
    
ca7d99d    to
    14af48d      
    Compare
  
    - Implement resumable downloads in common_download_file_single function - Add detection of partial download files (.downloadInProgress) - Check server support for HTTP Range requests via Accept-Ranges header - Implement HTTP Range request with "bytes=<start>-" header - Open files in append mode when resuming vs create mode for new downloads Signed-off-by: Eric Curtin <[email protected]>
14af48d    to
    b3c2c83      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work on my side. Btw I think it would be nice if we can have a better progress display, as the progress counts back from 0 when I resume the download, even though it only downloads the missing part.
And btw since we already implemented Range here, we should also be able to implement multi-threaded downloading which should significantly improve the download speed. We should look into this in the future.
| 
           Agree on both parts, will do in follow on PRs  | 
    
| 
           Btw @ngxson, @doringeman and @npopov-vst if you have a Windows machine around, I'd appreciate a quick test of this progress bar on Windows: I hope to just port a version of that over to llama-server. I only tested Linux/macOS at the time. With this recent PR, there was a Windows fix recently thanks to @npopov-vst : Want to be sure it looks fine on Windows terminals.  | 
    
| 
           @ericcurtin Hi, sure. So it depends on the current codepage. For me the default was 437:  
Maybe on Windows it is better to explicitly set it to UTF-8 like: Or use some ASCII character (like #) as a progressbar  | 
    
          
 Wanna open a PR? Since you are set up to test it on Windows. Without the hashes looks prettier I think.  | 
    
          
 Agree. 
 I can, but I am not sure where should I place these changes, since you mentioned in #15988 (comment), that you are going to refactor some parts. I can confirm, that setting  should solve all issues.  | 
    
          
 Don't worry about my refactor, I can do it after :) Most of my refactoring will be moving code from A -> B, will be pretty easy.  | 
    
- Implement resumable downloads in common_download_file_single function - Add detection of partial download files (.downloadInProgress) - Check server support for HTTP Range requests via Accept-Ranges header - Implement HTTP Range request with "bytes=<start>-" header - Open files in append mode when resuming vs create mode for new downloads Signed-off-by: Eric Curtin <[email protected]>
- Implement resumable downloads in common_download_file_single function - Add detection of partial download files (.downloadInProgress) - Check server support for HTTP Range requests via Accept-Ranges header - Implement HTTP Range request with "bytes=<start>-" header - Open files in append mode when resuming vs create mode for new downloads Signed-off-by: Eric Curtin <[email protected]>

