- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.4k
ROCm Port #1087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
        
      
    
  
     Merged
                    ROCm Port #1087
Changes from all commits
      Commits
    
    
            Show all changes
          
          
            105 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      0fd8363
              
                use hipblas based on cublas
              
              
                SlyEcho 54a63c1
              
                Update Makefile for the Cuda kernels
              
              
                SlyEcho 0e005f7
              
                Build file changes
              
              
                SlyEcho d3e1984
              
                add rpath
              
              
                SlyEcho 3677235
              
                More build file changes
              
              
                SlyEcho db7a012
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 3a004b2
              
                add rpath
              
              
                SlyEcho 608aa33
              
                change default GPU arch to match CMake
              
              
                SlyEcho d571d16
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho ef51e9e
              
                Merge branch 'ggerganov:master' into hipblas
              
              
                SlyEcho ecc0565
              
                only .cu file needs to be complied as device
              
              
                SlyEcho a1caa48
              
                add more cuda defines
              
              
                SlyEcho 3b4a531
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 2ab9d11
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho d194586
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho d8ea75e
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho c73def1
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho fcbc262
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho b67cc50
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho d83cfba
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 04c0d48
              
                Move all HIP stuff to ggml-cuda.cu
              
              
                SlyEcho 1107194
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 289073a
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho baeb482
              
                Revert to default copy
              
              
                SlyEcho 0aefa6a
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho a3296d5
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 070cbcc
              
                occupanct function
              
              
                SlyEcho 127f68e
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 605560d
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 0fe6384
              
                fix makefile
              
              
                SlyEcho 2956630
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 8bab456
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho a0b2d5f
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho c66115b
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho b19fefe
              
                Forwardcompat
              
              
                SlyEcho 600ace3
              
                update warp size
              
              
                SlyEcho f80ce7a
              
                Merge branch 'origin/master' into hipblas
              
              
                SlyEcho 174bf6a
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho a593a4f
              
                Add missing parameters
              
              
                SlyEcho 30d921a
              
                and makefile
              
              
                SlyEcho 4c8b3fb
              
                add configurable vars
              
              
                SlyEcho a4648c1
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 9fdaa1d
              
                Add more defs
              
              
                SlyEcho 33091a9
              
                Merge  'origin/master' into hipblas
              
              
                SlyEcho 5d6eb72
              
                warp size fixes
              
              
                SlyEcho 1ba4ce4
              
                Revert "warp size fixes"
              
              
                SlyEcho fa5b3d7
              
                fix makefile.
              
              
                SlyEcho 4362e80
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 85f902d
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho a836529
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 61df8e9
              
                add cudaMemset
              
              
                SlyEcho 6f7c156
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 67e229b
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 5dd2fbe
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho df7346c
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 35a6031
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho c1e5c83
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho c8ae945
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho bb16eff
              
                headers fix; add kquants_iter for hipblas and add gfx803 (#1)
              
              
                YellowRoseCx 04419f1
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 15db19a
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho c3e3733
              
                ROCm fixes
              
              
                SlyEcho 7735c5a
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 80e4e54
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho e610466
              
                Expand arch list and make it overrideable
              
              
                SlyEcho 8c2c497
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho afcb8fe
              
                Add new config option
              
              
                SlyEcho cd36b18
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 2ec4466
              
                Update build flags.
              
              
                SlyEcho 3db70b5
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 1f6294d
              
                Fix multi GPU on multiple amd architectures with rocblas_initialize()…
              
              
                YellowRoseCx 8e8054a
              
                Add rocblas to build files
              
              
                SlyEcho cde52d6
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho d2ade63
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho f8e3fc6
              
                rocblas init stuff
              
              
                SlyEcho 4336231
              
                add hipBLAS to README
              
              
                SlyEcho c1664a0
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho c1cb70d
              
                new build arg LLAMA_CUDA_MMQ_Y
              
              
                SlyEcho d91456a
              
                fix half2 decomposition
              
              
                ardfork ab62128
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 4024f91
              
                Add intrinsics polyfills for AMD
              
              
                SlyEcho 610ba4c
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 8f8ab6c
              
                hipLDFLAG Path change Unix to multisystem in Makefile
              
              
                YellowRoseCx 29a59b5
              
                Fix merge
              
              
                SlyEcho f41920e
              
                AMD assembly optimized __dp4a
              
              
                Engininja2 42e055d
              
                ws fix
              
              
                SlyEcho e6b6ae5
              
                Undo mess
              
              
                SlyEcho c299c4a
              
                New __dp4a assembly
              
              
                Engininja2 b815e97
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 4e58a05
              
                Allow overriding CC_TURING
              
              
                SlyEcho 6415610
              
                gfx1100 support
              
              
                SlyEcho 70e2f7c
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 68e79cc
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 3de6a9a
              
                reenable LLAMA_CUDA_FORCE_DMMV
              
              
                SlyEcho bbbc0ce
              
                makefile rewrite
              
              
                SlyEcho c88c2a9
              
                probably lld is not required
              
              
                SlyEcho 423db74
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 391dd9a
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 5d3e7b2
              
                use "ROCm" instead of "CUDA"
              
              
                SlyEcho 7b84217
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho 058f905
              
                ignore all build dirs
              
              
                SlyEcho a60231f
              
                Add Dockerfiles
              
              
                SlyEcho 81ecaa4
              
                fix llama-bench
              
              
                SlyEcho 238335f
              
                fix -nommq help for non CUDA/HIP
              
              
                SlyEcho 9035cfc
              
                Merge 'origin/master' into hipblas
              
              
                SlyEcho File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| ARG UBUNTU_VERSION=22.04 | ||
|  | ||
| # This needs to generally match the container host's environment. | ||
| ARG ROCM_VERSION=5.6 | ||
|  | ||
| # Target the CUDA build image | ||
| ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete | ||
|  | ||
| FROM ${BASE_ROCM_DEV_CONTAINER} as build | ||
|  | ||
| # Unless otherwise specified, we make a fat build. | ||
| # List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878 | ||
| # This is mostly tied to rocBLAS supported archs. | ||
| ARG ROCM_DOCKER_ARCH=\ | ||
| gfx803 \ | ||
| gfx900 \ | ||
| gfx906 \ | ||
| gfx908 \ | ||
| gfx90a \ | ||
| gfx1010 \ | ||
| gfx1030 \ | ||
| gfx1100 \ | ||
| gfx1101 \ | ||
| gfx1102 | ||
|  | ||
| COPY requirements.txt requirements.txt | ||
|  | ||
| RUN pip install --upgrade pip setuptools wheel \ | ||
| && pip install -r requirements.txt | ||
|  | ||
| WORKDIR /app | ||
|  | ||
| COPY . . | ||
|  | ||
| # Set nvcc architecture | ||
| ENV GPU_TARGETS=${ROCM_DOCKER_ARCH} | ||
| # Enable ROCm | ||
| ENV LLAMA_HIPBLAS=1 | ||
| ENV CC=/opt/rocm/llvm/bin/clang | ||
| ENV CXX=/opt/rocm/llvm/bin/clang++ | ||
|  | ||
| RUN make | ||
|  | ||
| ENTRYPOINT ["/app/.devops/tools.sh"] | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| ARG UBUNTU_VERSION=22.04 | ||
|  | ||
| # This needs to generally match the container host's environment. | ||
| ARG ROCM_VERSION=5.6 | ||
|  | ||
| # Target the CUDA build image | ||
| ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete | ||
|  | ||
| FROM ${BASE_ROCM_DEV_CONTAINER} as build | ||
|  | ||
| # Unless otherwise specified, we make a fat build. | ||
| # List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878 | ||
| # This is mostly tied to rocBLAS supported archs. | ||
| ARG ROCM_DOCKER_ARCH=\ | ||
| gfx803 \ | ||
| gfx900 \ | ||
| gfx906 \ | ||
| gfx908 \ | ||
| gfx90a \ | ||
| gfx1010 \ | ||
| gfx1030 \ | ||
| gfx1100 \ | ||
| gfx1101 \ | ||
| gfx1102 | ||
|  | ||
| COPY requirements.txt requirements.txt | ||
|  | ||
| RUN pip install --upgrade pip setuptools wheel \ | ||
| && pip install -r requirements.txt | ||
|  | ||
| WORKDIR /app | ||
|  | ||
| COPY . . | ||
|  | ||
| # Set nvcc architecture | ||
| ENV GPU_TARGETS=${ROCM_DOCKER_ARCH} | ||
| # Enable ROCm | ||
| ENV LLAMA_HIPBLAS=1 | ||
| ENV CC=/opt/rocm/llvm/bin/clang | ||
| ENV CXX=/opt/rocm/llvm/bin/clang++ | ||
|  | ||
| RUN make | ||
|  | ||
| ENTRYPOINT [ "/app/main" ] | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROCm path shouldn't be hardcoded to
/opt/rocm. It's common to use the env varROCM_PATH(alsoROCM_HOMEis sometime used)./opt/rocmshould only be a fallback.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this from AMD's docs, but they have updated it now: Using CMake. Probably because it is not going to work in Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't taken a look at AMD's docs. But they at least internally use
ROCM_PATHon all the projects that I have seen.As the CMake config would probably need change anyway for windows, and I don't think a lot of people will be impacted by not using their configured ROCm path, I think it's fine to let it that way for now. But whenever change to CMake config to add support for windows, it would be nice to also add support for one of the
ROCM_PATH/HIP_PATH/ROCM_HOMEon linux.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the latest docs say to always manually use a CMake prefix for configuring. Guess that makes sense because on Windows, people could install it anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows, you'd instead have
HIP_PATHset IIRC. But someone would need to check the HIP Windows SDK installation to be sure.