Project to help you get started deploying Generative AI models locally using Llamafile and friends. Llamfile aims to make open-source LLMs more accessible to both developers and end users by combining LLaMA C++ with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
What llamafile gives you is a fun web GUI chatbot, a turnkey OpenAI API compatible server, and a shell-scriptable CLI interface which together put you in control of artificial intelligence.
In addition to LLamfile, this project will help you get started with two related projects.
- Whisperfile: Combines wHIsper C++, which provides high-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model, with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
- Sdfile: Combines which provides high-performance inference of Stable Diffusion and Flux in pure C/C++, with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
If you haven't already done so, install Miniforge. Miniforge provides minimal installers for Conda and Mamba specific to conda-forge, with the following features pre-configured:
- Packages in the base environment are obtained from the
conda-forgechannel. - The
conda-forgechannel is set as the default (and only) channel.
Conda/mamba will be the primary package managers used to install the required Python dependencies. For convenience, a script is included that will download and install Miniforge, Conda, and Mamba. You can run the script using the following command.
./bin/install-miniforge.shAfter adding any necessary dependencies that should be downloaded via conda to the environment.yml file and any
dependencies that should be downloaded via pip to the requirements.txt file you create the Conda environment in a
sub-directory ./envof your project directory by running the following shell script.
./bin/create-conda-env.shIf you have an NVIDIA GPU, the in order to support GPU acceleration you need to install cuda-toolkit from the
nvidia Conda channel. This change is made in the environment-nvidia-gpu.yml file. Create the Conda environment
in a sub-directory ./envof your project directory by running the following shell script.
./bin/create-conda-env.sh environment-nvidia-gpu.ymlAfter creating the Conda environment you can install Llamafile (and Whisperfile and Sdfile) by running the following command.
conda run --prefix ./env --live-stream ./bin/install-llamafile.shThis command does the following.
- Properly configures the Conda environment.
- Downloads a recent version of Llamafile.
- Installs Llamafile binary into the
bin/directory of the Conda environment.
By default, this script downloads a recent version of Llamafile. You can install a specific release by passing the version number as a command line argument to the script as follows.
conda run --prefix ./env --live-stream ./bin/install-llamafile.sh 0.8.13After creating the Conda environment you can build Llamafile (and Whisperfile, Sdfile, and Llamafiler) by running the following command.
conda run --prefix ./env --live-stream ./bin/build-llamafile.shOnce the new environment has been created you can activate the environment with the following command.
conda activate ./envNote that the ./env directory is not under version control as it can always be re-created as
necessary.
This project is supported by funding from King Abdullah University of Science and Technology (KAUST) - Center of Excellence for Generative AI, under award number 5940.