- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
Added support for overriding tensor buffer types #2007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
        
      
            zpin
  wants to merge
  2
  commits into
  abetlen:main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
zpin:override_tensor
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
      
        
          +128
        
        
          −2
        
        
          
        
      
    
  
Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    | Could you offer the usage of this parameter. python3 -m llama_cpp.server --model /home/LLM/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf --port 8002 --verbose True --n_gpu_layers 99 ---tensor_buft_overrides exp=CPU
# and 
python3 -m llama_cpp.server --model /home/arda/LLM/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf --port 8002 --verbose True --n_gpu_layers 99 ---override_tensor exp=CPU both of them lead to error, here is the log: usage: __main__.py [-h] [--model MODEL] [--model_alias MODEL_ALIAS] [--n_gpu_layers N_GPU_LAYERS]
                   [--split_mode SPLIT_MODE] [--main_gpu MAIN_GPU] [--tensor_split [TENSOR_SPLIT ...]]
                   [--vocab_only VOCAB_ONLY] [--use_mmap USE_MMAP] [--use_mlock USE_MLOCK]
                   [--kv_overrides [KV_OVERRIDES ...]] [--rpc_servers RPC_SERVERS] [--seed SEED] [--n_ctx N_CTX]
                   [--n_batch N_BATCH] [--n_ubatch N_UBATCH] [--n_threads N_THREADS]
                   [--n_threads_batch N_THREADS_BATCH] [--rope_scaling_type ROPE_SCALING_TYPE]
                   [--rope_freq_base ROPE_FREQ_BASE] [--rope_freq_scale ROPE_FREQ_SCALE]
                   [--yarn_ext_factor YARN_EXT_FACTOR] [--yarn_attn_factor YARN_ATTN_FACTOR]
                   [--yarn_beta_fast YARN_BETA_FAST] [--yarn_beta_slow YARN_BETA_SLOW]
                   [--yarn_orig_ctx YARN_ORIG_CTX] [--mul_mat_q MUL_MAT_Q] [--logits_all LOGITS_ALL]
                   [--embedding EMBEDDING] [--offload_kqv OFFLOAD_KQV] [--flash_attn FLASH_ATTN]
                   [--last_n_tokens_size LAST_N_TOKENS_SIZE] [--lora_base LORA_BASE] [--lora_path LORA_PATH]
                   [--numa NUMA] [--chat_format CHAT_FORMAT] [--clip_model_path CLIP_MODEL_PATH] [--cache CACHE]
                   [--cache_type CACHE_TYPE] [--cache_size CACHE_SIZE]
                   [--hf_tokenizer_config_path HF_TOKENIZER_CONFIG_PATH]
                   [--hf_pretrained_model_name_or_path HF_PRETRAINED_MODEL_NAME_OR_PATH]
                   [--hf_model_repo_id HF_MODEL_REPO_ID] [--draft_model DRAFT_MODEL]
                   [--draft_model_num_pred_tokens DRAFT_MODEL_NUM_PRED_TOKENS] [--type_k TYPE_K] [--type_v TYPE_V]
                   [--verbose VERBOSE] [--host HOST] [--port PORT] [--ssl_keyfile SSL_KEYFILE]
                   [--ssl_certfile SSL_CERTFILE] [--api_key API_KEY] [--interrupt_requests INTERRUPT_REQUESTS]
                   [--disable_ping_events DISABLE_PING_EVENTS] [--root_path ROOT_PATH] [--config_file CONFIG_FILE]
__main__.py: error: unrecognized arguments: ---tensor_buft_overrides exp=CPU
 | 
| 
 merged but unuse in main branch,that‘s why | 
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Equivalent to the
-otllama.cpp argument:Can be passed as an optionlal string to the
Llamaclass using the newoverride_tensorparameter. Same format as the argument above.Provides more control over how memory is used, letting you selectively place specific tensors on different devices, especially helpful when running large MOE models.