- 
                Notifications
    You must be signed in to change notification settings 
- Fork 31k
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag #36835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            Cyrilvallez
  merged 34 commits into
  huggingface:main
from
inf3rnus:03-18-25-parallel-model-loading
  
      
      
   
  May 23, 2025 
      
    
      
        
          +234
        
        
          −76
        
        
          
        
      
    
  
  
     Merged
                    Changes from 26 commits
      Commits
    
    
            Show all changes
          
          
            34 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      8fb9b18
              
                Get parallel loader working. Include tests.
              
              
                inf3rnus 27f36f2
              
                Update the tests for parallel loading
              
              
                inf3rnus 7e5ecd8
              
                Merge branch 'main' into 03-18-25-parallel-model-loading
              
              
                inf3rnus e7c3ea5
              
                Rename env variables.
              
              
                inf3rnus 7599fe2
              
                Add docs for parallel model weight loading.
              
              
                inf3rnus 065e102
              
                Touch up parallel model loading docs.
              
              
                inf3rnus d31594a
              
                Touch up parallel model loading docs again.
              
              
                inf3rnus 33b3e0f
              
                Edit comment in test_modeling_utils_parallel_loading.py
              
              
                inf3rnus 3fb6b65
              
                Merge branch 'main' into 03-18-25-parallel-model-loading
              
              
                inf3rnus 0e22c04
              
                Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modelin…
              
              
                inf3rnus 904bdaf
              
                Correct times for parallelized loading, previous times were for a "ho…
              
              
                inf3rnus 7e37ba4
              
                Update parallel model loading so the spawn method is encapsulated. DR…
              
              
                inf3rnus a203f6a
              
                Update docs on model loading parallelism so that details on setting t…
              
              
                inf3rnus 14e9eef
              
                Fix style on model loading parallelism changes.
              
              
                inf3rnus fe1fc0c
              
                Merge remote-tracking branch 'upstream/main' into 03-18-25-parallel-m…
              
              
                inf3rnus d5637e8
              
                Merge latest version of master's modeling_utils.
              
              
                inf3rnus e0d37bb
              
                Removed unused variable.
              
              
                inf3rnus 9b4165c
              
                Fix argument packing for the parallel loader.
              
              
                inf3rnus 1085461
              
                Fix state dict being undefined in the parallel model loader.
              
              
                inf3rnus 82ab2ec
              
                Merge main.
              
              
                inf3rnus 7ae3db6
              
                Rename variables used in parallel model loading for clarity. Use get_…
              
              
                inf3rnus 8d04325
              
                Switch to the use of threads for parallel model loading.
              
              
                inf3rnus 674ec37
              
                Update docs for parallel loading.
              
              
                inf3rnus b8a1470
              
                Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADI…
              
              
                inf3rnus efb6605
              
                Move parallelized shard loading into its own function.
              
              
                inf3rnus c66daef
              
                Remove use of is_true(). Favor checking env var true values for HF_EN…
              
              
                inf3rnus 4566c5c
              
                Update copyright to 2025 in readme for paralell model loading.
              
              
                inf3rnus 610c5e3
              
                Remove garbage collection line in load_shard_file, implicit garbage c…
              
              
                inf3rnus a9cb54b
              
                Run formatter on modeling_utils.py
              
              
                inf3rnus fc76fbb
              
                Merge branch 'main' into 03-18-25-parallel-model-loading
              
              
                inf3rnus 16f3751
              
                Apply style fixes
              
              
                github-actions[bot] cd0f42e
              
                Merge main.
              
              
                inf3rnus 3b9f458
              
                Delete tests/utils/test_modeling_utils_parallel_loading.py
              
              
                inf3rnus b6bf421
              
                Merge branch 'main' into 03-18-25-parallel-model-loading
              
              
                Cyrilvallez File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
| --> | ||
|  | ||
| # Environment Variables | ||
|  | ||
| ## HF_ENABLE_PARALLEL_LOADING | ||
|  | ||
| By default this is disabled. Enables the loading of torch and safetensor based weights to be loaded in parallel. Can decrease the time to load large models significantly, often times producing speed ups around ~50%. | ||
|  | ||
| Can be set to a string equal to `"false"` or `"true"`. e.g. `os.environ["HF_ENABLE_PARALLEL_LOADING"] = "true"`. | ||
|  | ||
| e.g. `facebook/opt-30b` on an AWS EC2 g4dn.metal instance can be made to load in ~30s with this enabled vs ~55s without it. | ||
|  | ||
| Profile before committing to using this environment variable, this will not produce speed ups for smaller models. | ||
|  | ||
| ```py | ||
| import os | ||
|  | ||
| os.environ["HF_ENABLE_PARALLEL_LOADING"] = "true" | ||
|  | ||
| from transformers import pipeline | ||
|  | ||
| model = pipeline(task="text-generation", model="facebook/opt-30b", device_map="auto") | ||
| ``` | ||
|  | ||
| ## HF_PARALLEL_LOADING_WORKERS | ||
|  | ||
| Determines how many threads should be used when parallel loading is enabled. Default is `8`. | ||
|  | ||
| If the number of files that are being loaded is less than the number of threads specified, the number that is actually spawned will be equal to the number of files. | ||
|  | ||
| e.g. If you specify 8 workers, and there are only 2 files, only 2 workers will be spawned. | ||
|  | ||
| Tune as you see fit. | ||
|  | ||
| ```py | ||
| import os | ||
|  | ||
| os.environ["HF_ENABLE_PARALLEL_LOADING"] = "true" | ||
| os.environ["HF_PARALLEL_LOADING_WORKERS"] = "4" | ||
|  | ||
| from transformers import pipeline | ||
|  | ||
| model = pipeline(task="text-generation", model="facebook/opt-30b", device_map="auto") | ||
| ``` | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.