forked from ggml-org/llama.cpp
    
        
        - 
                Notifications
    
You must be signed in to change notification settings  - Fork 0
 
Cherry 1118 #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
                
     Merged
            
            Cherry 1118 #5
Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    * Add scaffolding for ggml logging macros * Metal backend now uses GGML logging * Cuda backend now uses GGML logging * Cann backend now uses GGML logging * Add enum tag to parameters * Use C memory allocation funcs * Fix compile error * Use GGML_LOG instead of GGML_PRINT * Rename llama_state to llama_logger_state * Prevent null format string * Fix whitespace * Remove log callbacks from ggml backends * Remove cuda log statement
* Update README.md fixed RNG seed info * changed print format to unsigned
ggml : remove test-backend-buffer ggml : fix CUDA build warnings
* rerank : use [SEP] token instead of [BOS] ggml-ci * common : sanity check for non-NULL tokens ggml-ci * ci : adjust rank score interval ggml-ci * ci : add shebang to run.sh ggml-ci
Co-authored-by: Samuel Morris <[email protected]>
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m * Moving Block_release to the deallocation code * Release encode block when re-setting encoding buffer count if needed * Update ggml/src/ggml-metal.m --------- Co-authored-by: Georgi Gerganov <[email protected]>
* ggml : add metal backend registry / device ggml-ci * metal : fix names [no ci] * metal : global registry and device instances ggml-ci * cont : alternative initialization of global objects ggml-ci * llama : adapt to backend changes ggml-ci * fixes * metal : fix indent * metal : fix build when MTLGPUFamilyApple3 is not available ggml-ci * fix merge * metal : avoid unnecessary singleton accesses ggml-ci * metal : minor fix [no ci] * metal : g_state -> g_ggml_ctx_dev_main [no ci] * metal : avoid reference of device context in the backend context ggml-ci * metal : minor [no ci] * metal : fix maxTransferRate check * metal : remove transfer rate stuff --------- Co-authored-by: slaren <[email protected]>
Flake lock file updates:
• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/bcef6817a8b2aa20a5a6dbb19b43e63c5bf8619a?narHash=sha256-HO4zgY0ekfwO5bX0QH/3kJ/h4KvUDFZg8YpkNwIbg1U%3D' (2024-09-12)
  → 'github:hercules-ci/flake-parts/3d04084d54bedc3d6b8b736c70ef449225c361b1?narHash=sha256-K5ZLCyfO/Zj9mPFldf3iwS6oZStJcU4tSpiXTMYaaL0%3D' (2024-10-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
  → 'https://github.com/NixOS/nixpkgs/archive/fb192fec7cc7a4c26d51779e9bab07ce6fa5597a.tar.gz?narHash=sha256-0xHYkMkeLVQAMa7gvkddbPqpxph%2BhDzdu1XdGPJR%2BOs%3D' (2024-10-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26)
  → 'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
    * docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android
…-org#9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers
Signed-off-by: Masanari Iida <[email protected]>
* server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests
* ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it
An updated version will be added in ggml-org#9787
* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci
ggml-ci
* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      Labels
      
    android
  
    Apple Metal
  
    build
  
    devops
  
    documentation
  Improvements or additions to documentation 
  
    examples
  
    ggml
  
    Kompute
  
    nix
  
    Nvidia GPU
  
    python
  
    script
  
    server
  
    SYCL
  
    testing
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Cherry pick to 11-18 version