Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3,321 changes: 1,704 additions & 1,617 deletions Makefile

Large diffs are not rendered by default.

1,617 changes: 1,617 additions & 0 deletions Makefile-llama-cpp-original

Large diffs are not rendered by default.

547 changes: 547 additions & 0 deletions README-llama.cpp.md

Large diffs are not rendered by default.

622 changes: 75 additions & 547 deletions README.md

Large diffs are not rendered by default.

133 changes: 133 additions & 0 deletions docs/Building-ls1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
## Building llama-server

Brad Hutchings<br/>
[email protected]

This file contains instructions for building `llama.cpp` with `cosmocc` to yield a `llama-server` executable that will run on multiple platforms.

### Environment Variables

Let's define some environment variables:
```
BUILDING_DIR="1-BUILDING-llama.cpp"
printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
```

_Note that if you copy each code block from the guide and paste it into your terminal, each block ends with a message so you won't lose your place in this guide._

---
### Build Dependencies
I build with a freshly installed Ubuntu 24.04 VM. Here are some packages that are helpful in creating a working build system. You may need to install more.
```
sudo apt install -y git python3-pip build-essential zlib1g-dev \
libffi-dev libssl-dev libbz2-dev libreadline-dev libsqlite3-dev \
liblzma-dev tk-dev python3-tk cmake zip
printf "\n**********\n*\n* FINISHED: Build Dependencies.\n*\n**********\n\n"
```

---
### Clone this Repo Locally
Clone this repo into a `~\llama.cpp` directory.
```
cd ~
git clone https://github.com/BradHutchings/llama-server-one.git $BUILDING_DIR
printf "\n**********\n*\n* FINISHED: Clone this Repo Locally.\n*\n**********\n\n"
```

**Optional:** Use the `work-in-progress` branch where I implement and test my own changes and where I test upstream changes from `llama.cpp`.
```
cd ~/$BUILDING_DIR
git checkout work-in-progress
printf "\n**********\n*\n* FINISHED: Checkout work-in-progress.\n*\n**********\n\n"
```

---
### Make llama.cpp
We use the old `Makefile` rather than CMake. We've updated the `Makefile` in this repo to build llama.cpp correctly.
```
cd ~/$BUILDING_DIR
export LLAMA_MAKEFILE=1
make clean
make
printf "\n**********\n*\n* FINISHED: Make llama.cpp.\n*\n**********\n\n"
```

If the build is successful, it will end with this message:

&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**

If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.

If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.

#### List Directory

At this point, you should see `llama-server` and other built binaries in the directory listing.
```
ls -al
printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
```

---
### Install Cosmo
```
mkdir -p cosmocc
cd cosmocc
wget https://cosmo.zip/pub/cosmocc/cosmocc.zip
unzip cosmocc.zip
rm cosmocc.zip
cd ..
printf "\n**********\n*\n* FINISHED: Install Cosmo.\n*\n**********\n\n"
```

---
### Prepare to make llama.cpp with Cosmo
```
export PATH="$(pwd)/cosmocc/bin:$PATH"
export CC="cosmocc -I$(pwd)/cosmocc/include -L$(pwd)/cosmocc/lib"
export CXX="cosmocc -I$(pwd)/cosmocc/include \
-I$(pwd)/cosmocc/include/third_party/libcxx \
-L$(pwd)/cosmocc/lib"
export UNAME_S="cosmocc"
export UNAME_P="cosmocc"
export UNAME_M="cosmocc"
printf "\n**********\n*\n* FINISHED: Prepare to make llama.cpp with Cosmo.\n*\n**********\n\n"
```

---
### Make llama.cpp with Cosmo
```
make clean
make
printf "\n**********\n*\n* FINISHED: Make llama.cpp with Cosmo\n*\n**********\n\n"
```

If the build is successful, it will end with this message:

&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**

If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.

If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.

#### List Directory

At this point, you should see `llama-server` and other built binaries in the directory listing.
```
ls -al
printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
```

#### Verify Zip Archive

`llama-server` is actually a zip acrhive with an "Actually Portable Executable" (APE) loader prefix. Let's verify the zip archive part:
```
unzip -l llama-server
printf "\n**********\n*\n* FINISHED: Verify Zip Archive.\n*\n**********\n\n"
```

---
### Configuring llama-server-one

Now that you've built `llama-server`, you're ready to configure it as `llama-server-one`. Follow instructions in [Configuring-ls1.md](Configuring-ls1.md).

195 changes: 195 additions & 0 deletions docs/Configuring-ls1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
## Configuring llama-server-one

Brad Hutchings<br/>
[email protected]

This file contains instructions for configuring the `llama-server-one` executable to make it ready to package for multiple platforms.

---
### Environment Variables

Let's define some environment variables:
```
BUILDING_DIR="1-BUILDING-llama.cpp"
CONFIGURING_DIR="2-CONFIGURING-llama-server-one"

LLAMA_SERVER="llama-server"
LLAMA_SERVER_ONE="llama-server-one"
LLAMA_SERVER_ONE_ZIP="llama-server-one.zip"
DEFAULT_ARGS="default-args"
printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
```

---
### Create Configuration Directory

Next, let's create a directory where we'll configure `llama-server-one`:
```
cd ~
rm -r -f ~/$CONFIGURING_DIR
mkdir -p $CONFIGURING_DIR
cp ~/$BUILDING_DIR/$LLAMA_SERVER \
~/$CONFIGURING_DIR/$LLAMA_SERVER_ONE_ZIP

cd ~/$CONFIGURING_DIR
printf "\n**********\n*\n* FINISHED: Create Configuration Directory.\n*\n**********\n\n"
```

---
### Examine Contents of Zip Archive

Look at the contents of the `llama-server-one` zip archive:
```
unzip -l $LLAMA_SERVER_ONE_ZIP
printf "\n**********\n*\n* FINISHED: Examine Contents of Zip Archive.\n*\n**********\n\n"
```

---
### Delete Extraneous Timezone Files

You should notice a bunch of extraneous timezone related files in `/usr/*`. Let's get rid of those:
```
zip -d $LLAMA_SERVER_ONE_ZIP "/usr/*"
printf "\n**********\n*\n* FINISHED: Delete Extraneous Timezone Files.\n*\n**********\n\n"
```

---
### Verify Contents of Zip Archive

Verify that these files are no longer in the archive:
```
unzip -l $LLAMA_SERVER_ONE_ZIP
printf "\n**********\n*\n* FINISHED: Verify Contents of Zip Archive.\n*\n**********\n\n"
```

---
### OPTIONAL: Create website Directory in Archive

`llama.cpp` has a built in chat UI. If you'd like to provide a custom UI, you should add a `website` directory to the `llama-server-one` archive. `llama.cpp`'s chat UI is optimized for serving inside the project's source code. But we can copy the unoptimized source:
```
mkdir -p website
cp -r ~/$BUILDING_DIR/examples/server/public_legacy/* website
zip -0 -r $LLAMA_SERVER_ONE_ZIP website/*
printf "\n**********\n*\n* FINISHED: Create website Directory in Archive.\n*\n**********\n\n"
```

#### OPTONAL: Verify website Directory in Archive

Verify that the archive has your website:
```
unzip -l $LLAMA_SERVER_ONE_ZIP
printf "\n**********\n*\n* FINISHED: Verify website Directory in Archive.\n*\n**********\n\n"
```
---
### Create default-args File

A `default-args` file in the archive can specify sane default parameters. The format of the file is parameter name on a line, parameter value on a line, rinse, repeat. End the file with a `...` line to include user specified parameters.

We don't yet support including the model inside the zip archive (yet). That has a 4GB size limitation on Windows anyway, as `.exe` files cannot exceed 4GB. So let's use an adjacent file called `model.gguf`.

We will serve on localhost, port 8080 by default for safety. The `--ctx-size` parameter is the size of the context window. This is kinda screwy to have as a set size rather than a maximum because the `.gguf` files now have the training context size in metadata. We set it to 8192 to be sensible.
```
cat << EOF > $DEFAULT_ARGS
-m
model.gguf
--host
127.0.0.1
--port
8080
--ctx-size
8192
...
EOF
printf "\n**********\n*\n* FINISHED: Create Default args File.\n*\n**********\n\n"
```

#### OPTIONAL: Create default-args File with Website

If you added a website to the archive, use this instead:
```
cat << EOF > $DEFAULT_ARGS
-m
model.gguf
--host
127.0.0.1
--port
8080
--ctx-size
8192
--path
/zip/website
...
EOF
printf "\n**********\n*\n* FINISHED: Create Default args File with Website.\n*\n**********\n\n"
```

---
### Add default-args File to Archive

Add the `default-args` file to the archive:
```
zip -0 -r $LLAMA_SERVER_ONE_ZIP $DEFAULT_ARGS
printf "\n**********\n*\n* FINISHED: Add default-args File to Archive.\n*\n**********\n\n"
```

---
### Verify default-args File in Archive

Verify that the archive contains the `default-args` file:
```
unzip -l $LLAMA_SERVER_ONE_ZIP
printf "\n**********\n*\n* FINISHED: Verify default-args File in Archive.\n*\n**********\n\n"
```

---
### Remove .zip Extension

Remove the `.zip` from our working file:
```
mv $LLAMA_SERVER_ONE_ZIP $LLAMA_SERVER_ONE
printf "\n**********\n*\n* FINISHED: Remove .zip Extension.\n*\n**********\n\n"
```

---
### Download Model

Let's download a small model. We'll use Google Gemma 1B Instruct v3, a surprisingly capable tiny model.
```
MODEL_FILE="Google-Gemma-1B-Instruct-v3-q8_0.gguf"
wget https://huggingface.co/bradhutchings/Brads-LLMs/resolve/main/models/$MODEL_FILE?download=true \
--show-progress --quiet -O model.gguf
printf "\n**********\n*\n* FINISHED: Download Model.\n*\n**********\n\n"
```

---
### Test Run

Now we can test run `llama-server-one`, listening on localhost:8080.
```
./$LLAMA_SERVER_ONE
```

After starting up and loading the model, it should display:

**main: server is listening on http://127.0.0.1:8080 - starting the main loop**<br/>
**srv update_slots: all slots are idle**

Hit `ctrl-C` on your keyboard to stop it.

---
### Test Run on Public Interfaces

If you'd like it to listen on all available interfaces, so you can connect from a browser on another computer:
```
./$LLAMA_SERVER_ONE --host 0.0.0.0
```

After starting up and loading the model, it should display:

**main: server is listening on http://0.0.0.0:8080 - starting the main loop**<br/>
**srv update_slots: all slots are idle**

Hit `ctrl-C` on your keyboard to stop it.

---
Congratulations! You are ready to package your `llams-server-one` executable for deployment. Follow instructions in [Packaging-ls1.md](Packaging-ls1.md).
Loading