|
1 | | -# windows_manage_llms |
2 | | -PowerShell automation to download large language models (LLMs) from Git repositories and quantize them with llama.cpp into the GGUF format. |
| 1 | +# Windows Manage Large Language Models |
| 2 | + |
| 3 | +PowerShell automation to download large language models (LLMs) via Git and quantize them with llama.cpp to the `GGUF` format. |
| 4 | + |
| 5 | +Think batch quantization like https://huggingface.co/TheBloke does it, but on your local machine :wink: |
| 6 | + |
| 7 | +## Features |
| 8 | + |
| 9 | +- Easy configuration via a `.env` file |
| 10 | +- Automates the synchronization of Git repositories containing large files (LFS) |
| 11 | +- Only fetches one LFS object at a time |
| 12 | +- Displays a progress indicator on downloading LFS objects |
| 13 | +- Automates the quantization from the source models |
| 14 | +- Handles the intermediate files during quantization to reduce disk usage |
| 15 | +- Improves quantization speed by separating read from write loads |
| 16 | + |
| 17 | +## Installation |
| 18 | + |
| 19 | +### Prerequisites |
| 20 | + |
| 21 | +Use https://github.com/countzero/windows_llama.cpp to compile a specific version of the [llama.cpp](https://github.com/ggerganov/llama.cpp) project on your machine. |
| 22 | + |
| 23 | + |
| 24 | +### Clone the repository from GitHub |
| 25 | + |
| 26 | +Clone the repository to a nice place on your machine via: |
| 27 | + |
| 28 | +```PowerShell |
| 29 | +git clone [email protected]:countzero/windows_manage_large_language_models.git |
| 30 | +``` |
| 31 | + |
| 32 | +### Create a .env file |
| 33 | + |
| 34 | +Create the following `.env` file in the project directory. Make sure to change the `LLAMA_CPP_DIRECTORY` value. |
| 35 | + |
| 36 | +```Env |
| 37 | +# Path to the llama.cpp project that contains the |
| 38 | +# convert.py script and the quantize.exe binary. |
| 39 | +LLAMA_CPP_DIRECTORY=C:\windows_llama.cpp\vendor\llama.cpp |
| 40 | +
|
| 41 | +# Path to the Git repositories containing the models. |
| 42 | +SOURCE_DIRECTORY=.\source |
| 43 | +
|
| 44 | +# Path to the quantized models in GGUF format. |
| 45 | +TARGET_DIRECTORY=.\gguf |
| 46 | +
|
| 47 | +# Path to the cache directory for intermediate files. |
| 48 | +# |
| 49 | +# Hint: Ideally this should be located on a different |
| 50 | +# physical drive to improve the quantization speed. |
| 51 | +CACHE_DIRECTORY=.\cache |
| 52 | +
|
| 53 | +# |
| 54 | +# Comma separated list of quantization types. |
| 55 | +# |
| 56 | +# Possible llama.cpp quantization types: |
| 57 | +# |
| 58 | +# Q2_K : 2.63G, +0.6717 ppl @ LLaMA-v1-7B |
| 59 | +# Q3_K_S : 2.75G, +0.5551 ppl @ LLaMA-v1-7B |
| 60 | +# Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B |
| 61 | +# Q3_K_L : 3.35G, +0.1764 ppl @ LLaMA-v1-7B |
| 62 | +# Q4_0 : 3.56G, +0.2166 ppl @ LLaMA-v1-7B |
| 63 | +# Q4_1 : 3.90G, +0.1585 ppl @ LLaMA-v1-7B |
| 64 | +# Q4_K_S : 3.59G, +0.0992 ppl @ LLaMA-v1-7B |
| 65 | +# Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B |
| 66 | +# Q5_0 : 4.33G, +0.0683 ppl @ LLaMA-v1-7B |
| 67 | +# Q5_1 : 4.70G, +0.0349 ppl @ LLaMA-v1-7B |
| 68 | +# Q5_K_S : 4.33G, +0.0400 ppl @ LLaMA-v1-7B |
| 69 | +# Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B |
| 70 | +# Q6_K : 5.15G, -0.0008 ppl @ LLaMA-v1-7B |
| 71 | +# Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B |
| 72 | +# F16 : 13.00G @ 7B |
| 73 | +# F32 : 26.00G @ 7B |
| 74 | +# COPY : only copy tensors, no quantizing |
| 75 | +# |
| 76 | +# Hint: The sweet spot is Q4_K_M. |
| 77 | +# |
| 78 | +QUANTIZATION_TYPES=q4_K_M,q2_K |
| 79 | +``` |
| 80 | + |
| 81 | +## Usage |
| 82 | + |
| 83 | +### Clone a model |
| 84 | + |
| 85 | +Clone a Git repository containing an LLM into the `SOURCE_DIRECTORY` without checking out any files and downloading any large files (lfs). |
| 86 | + |
| 87 | +```PowerShell |
| 88 | +git -C "./source" clone --no-checkout https://huggingface.co/microsoft/Orca-2-7b |
| 89 | +``` |
| 90 | + |
| 91 | +### Download model sources |
| 92 | + |
| 93 | +Download all files across all Git repositories that are inside the `SOURCE_DIRECTORY`. |
| 94 | + |
| 95 | +```PowerShell |
| 96 | +./download_model_sources.ps1 |
| 97 | +``` |
| 98 | + |
| 99 | +**Hint:** This can also be used to update already existing sources from the remote repositories. |
| 100 | + |
| 101 | +### Quantize models |
| 102 | + |
| 103 | +Quantize all model weights that are inside the `SOURCE_DIRECTORY` into the `TARGET_DIRECTORY` to create a specific `GGUF` file for each `QUANTIZATION_TYPES`. |
| 104 | + |
| 105 | +```PowerShell |
| 106 | +./quantize_weights_for_llama.cpp.ps1 |
| 107 | +``` |
0 commit comments