@@ -104,6 +104,7 @@ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
104104### Windows Notes
105105
106106If you run into issues where it complains it can't find ` 'nmake' ` ` '?' ` or CMAKE_C_COMPILER, you can extract w64devkit as [ mentioned in llama.cpp repo] ( https://github.com/ggerganov/llama.cpp#openblas ) and add those manually to CMAKE_ARGS before running ` pip ` install:
107+
107108``` ps
108109$env:CMAKE_GENERATOR = "MinGW Makefiles"
109110$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"
@@ -118,17 +119,19 @@ Detailed MacOS Metal GPU install documentation is available at [docs/install/mac
118119#### M1 Mac Performance Issue
119120
120121Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
121- ```
122+
123+ ``` bash
122124wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
123125bash Miniforge3-MacOSX-arm64.sh
124126```
127+
125128Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
126129
127130#### M Series Mac Error: ` (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')) `
128131
129132Try installing with
130133
131- ```
134+ ``` bash
132135CMAKE_ARGS=" -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
133136```
134137
@@ -152,7 +155,12 @@ Below is a short example demonstrating how to use the high-level API to for basi
152155
153156``` python
154157>> > from llama_cpp import Llama
155- >> > llm = Llama(model_path = " ./models/7B/llama-model.gguf" )
158+ >> > llm = Llama(
159+ model_path = " ./models/7B/llama-model.gguf" ,
160+ # n_gpu_layers=-1, # Uncomment to use GPU acceleration
161+ # seed=1337, # Uncomment to set a specific seed
162+ # n_ctx=2048, # Uncomment to increase the context window
163+ )
156164>> > output = llm(
157165 " Q: Name the planets in the solar system? A: " , # Prompt
158166 max_tokens = 32 , # Generate up to 32 tokens
@@ -191,7 +199,10 @@ Note that `chat_format` option must be set for the particular model you are usin
191199
192200``` python
193201>> > from llama_cpp import Llama
194- >> > llm = Llama(model_path = " path/to/llama-2/llama-model.gguf" , chat_format = " llama-2" )
202+ >> > llm = Llama(
203+ model_path = " path/to/llama-2/llama-model.gguf" ,
204+ chat_format = " llama-2"
205+ )
195206>> > llm.create_chat_completion(
196207 messages = [
197208 {" role" : " system" , " content" : " You are an assistant who perfectly describes images." },
0 commit comments