Skip to content

Commit a764505

Browse files
Add files via upload
Signed-off-by: Brad Hutchings <[email protected]>
1 parent c80a775 commit a764505

File tree

3 files changed

+459
-0
lines changed

3 files changed

+459
-0
lines changed

docs/Building-ls1.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
## Building llama-server
2+
3+
Brad Hutchings<br/>
4+
5+
6+
This file contains instructions for building `llama.cpp` with `cosmocc` to yield a `llama-server` executable that will run on multiple platforms.
7+
8+
### Environment Variables
9+
10+
Let's define some environment variables:
11+
```
12+
BUILDING_DIR="1-BUILDING-llama.cpp"
13+
printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
14+
```
15+
16+
_Note that if you copy each code block from the guide and paste it into your terminal, each block ends with a message so you won't lose your place in this guide._
17+
18+
---
19+
### Build Dependencies
20+
I build with a freshly installed Ubuntu 24.04 VM. Here are some packages that are helpful in creating a working build system. You may need to install more.
21+
```
22+
sudo apt install -y git python3-pip build-essential zlib1g-dev \
23+
libffi-dev libssl-dev libbz2-dev libreadline-dev libsqlite3-dev \
24+
liblzma-dev tk-dev python3-tk cmake zip
25+
printf "\n**********\n*\n* FINISHED: Build Dependencies.\n*\n**********\n\n"
26+
```
27+
28+
---
29+
### Clone this Repo Locally
30+
Clone this repo into a `~\llama.cpp` directory.
31+
```
32+
cd ~
33+
git clone https://github.com/BradHutchings/llama-server-one.git $BUILDING_DIR
34+
printf "\n**********\n*\n* FINISHED: Clone this Repo Locally.\n*\n**********\n\n"
35+
```
36+
37+
**Optional:** Use the `work-in-progress` branch where I implement and test my own changes and where I test upstream changes from `llama.cpp`.
38+
```
39+
cd ~/$BUILDING_DIR
40+
git checkout work-in-progress
41+
printf "\n**********\n*\n* FINISHED: Checkout work-in-progress.\n*\n**********\n\n"
42+
```
43+
44+
---
45+
### Make llama.cpp
46+
We use the old `Makefile` rather than CMake. We've updated the `Makefile` in this repo to build llama.cpp correctly.
47+
```
48+
cd ~/$BUILDING_DIR
49+
export LLAMA_MAKEFILE=1
50+
make clean
51+
make
52+
printf "\n**********\n*\n* FINISHED: Make llama.cpp.\n*\n**********\n\n"
53+
```
54+
55+
If the build is successful, it will end with this message:
56+
57+
&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**
58+
59+
If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.
60+
61+
If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.
62+
63+
#### List Directory
64+
65+
At this point, you should see `llama-server` and other built binaries in the directory listing.
66+
```
67+
ls -al
68+
printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
69+
```
70+
71+
---
72+
### Install Cosmo
73+
```
74+
mkdir -p cosmocc
75+
cd cosmocc
76+
wget https://cosmo.zip/pub/cosmocc/cosmocc.zip
77+
unzip cosmocc.zip
78+
rm cosmocc.zip
79+
cd ..
80+
printf "\n**********\n*\n* FINISHED: Install Cosmo.\n*\n**********\n\n"
81+
```
82+
83+
---
84+
### Prepare to make llama.cpp with Cosmo
85+
```
86+
export PATH="$(pwd)/cosmocc/bin:$PATH"
87+
export CC="cosmocc -I$(pwd)/cosmocc/include -L$(pwd)/cosmocc/lib"
88+
export CXX="cosmocc -I$(pwd)/cosmocc/include \
89+
-I$(pwd)/cosmocc/include/third_party/libcxx \
90+
-L$(pwd)/cosmocc/lib"
91+
export UNAME_S="cosmocc"
92+
export UNAME_P="cosmocc"
93+
export UNAME_M="cosmocc"
94+
printf "\n**********\n*\n* FINISHED: Prepare to make llama.cpp with Cosmo.\n*\n**********\n\n"
95+
```
96+
97+
---
98+
### Make llama.cpp with Cosmo
99+
```
100+
make clean
101+
make
102+
printf "\n**********\n*\n* FINISHED: Make llama.cpp with Cosmo\n*\n**********\n\n"
103+
```
104+
105+
If the build is successful, it will end with this message:
106+
107+
&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**
108+
109+
If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.
110+
111+
If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.
112+
113+
#### List Directory
114+
115+
At this point, you should see `llama-server` and other built binaries in the directory listing.
116+
```
117+
ls -al
118+
printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
119+
```
120+
121+
#### Verify Zip Archive
122+
123+
`llama-server` is actually a zip acrhive with an "Actually Portable Executable" (APE) loader prefix. Let's verify the zip archive part:
124+
```
125+
unzip -l llama-server
126+
printf "\n**********\n*\n* FINISHED: Verify Zip Archive.\n*\n**********\n\n"
127+
```
128+
129+
---
130+
### Configuring llama-server-one
131+
132+
Now that you've built `llama-server`, you're ready to configure it as `llama-server-one`. Follow instructions in [Configuring-ls1.md](Configuring-ls1.md).
133+

docs/Configuring-ls1.md

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
## Configuring llama-server-one
2+
3+
Brad Hutchings<br/>
4+
5+
6+
This file contains instructions for configuring the `llama-server-one` executable to make it ready to package for multiple platforms.
7+
8+
---
9+
### Environment Variables
10+
11+
Let's define some environment variables:
12+
```
13+
BUILDING_DIR="1-BUILDING-llama.cpp"
14+
CONFIGURING_DIR="2-CONFIGURING-llama-server-one"
15+
16+
LLAMA_SERVER="llama-server"
17+
LLAMA_SERVER_ONE="llama-server-one"
18+
LLAMA_SERVER_ONE_ZIP="llama-server-one.zip"
19+
DEFAULT_ARGS="default-args"
20+
printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
21+
```
22+
23+
---
24+
### Create Configuration Directory
25+
26+
Next, let's create a directory where we'll configure `llama-server-one`:
27+
```
28+
cd ~
29+
rm -r -f ~/$CONFIGURING_DIR
30+
mkdir -p $CONFIGURING_DIR
31+
cp ~/$BUILDING_DIR/$LLAMA_SERVER \
32+
~/$CONFIGURING_DIR/$LLAMA_SERVER_ONE_ZIP
33+
34+
cd ~/$CONFIGURING_DIR
35+
printf "\n**********\n*\n* FINISHED: Create Configuration Directory.\n*\n**********\n\n"
36+
```
37+
38+
---
39+
### Examine Contents of Zip Archive
40+
41+
Look at the contents of the `llama-server-one` zip archive:
42+
```
43+
unzip -l $LLAMA_SERVER_ONE_ZIP
44+
printf "\n**********\n*\n* FINISHED: Examine Contents of Zip Archive.\n*\n**********\n\n"
45+
```
46+
47+
---
48+
### Delete Extraneous Timezone Files
49+
50+
You should notice a bunch of extraneous timezone related files in `/usr/*`. Let's get rid of those:
51+
```
52+
zip -d $LLAMA_SERVER_ONE_ZIP "/usr/*"
53+
printf "\n**********\n*\n* FINISHED: Delete Extraneous Timezone Files.\n*\n**********\n\n"
54+
```
55+
56+
---
57+
### Verify Contents of Zip Archive
58+
59+
Verify that these files are no longer in the archive:
60+
```
61+
unzip -l $LLAMA_SERVER_ONE_ZIP
62+
printf "\n**********\n*\n* FINISHED: Verify Contents of Zip Archive.\n*\n**********\n\n"
63+
```
64+
65+
---
66+
### OPTIONAL: Create website Directory in Archive
67+
68+
`llama.cpp` has a built in chat UI. If you'd like to provide a custom UI, you should add a `website` directory to the `llama-server-one` archive. `llama.cpp`'s chat UI is optimized for serving inside the project's source code. But we can copy the unoptimized source:
69+
```
70+
mkdir -p website
71+
cp -r ~/$BUILDING_DIR/examples/server/public_legacy/* website
72+
zip -0 -r $LLAMA_SERVER_ONE_ZIP website/*
73+
printf "\n**********\n*\n* FINISHED: Create website Directory in Archive.\n*\n**********\n\n"
74+
```
75+
76+
#### OPTONAL: Verify website Directory in Archive
77+
78+
Verify that the archive has your website:
79+
```
80+
unzip -l $LLAMA_SERVER_ONE_ZIP
81+
printf "\n**********\n*\n* FINISHED: Verify website Directory in Archive.\n*\n**********\n\n"
82+
```
83+
---
84+
### Create default-args File
85+
86+
A `default-args` file in the archive can specify sane default parameters. The format of the file is parameter name on a line, parameter value on a line, rinse, repeat. End the file with a `...` line to include user specified parameters.
87+
88+
We don't yet support including the model inside the zip archive (yet). That has a 4GB size limitation on Windows anyway, as `.exe` files cannot exceed 4GB. So let's use an adjacent file called `model.gguf`.
89+
90+
We will serve on localhost, port 8080 by default for safety. The `--ctx-size` parameter is the size of the context window. This is kinda screwy to have as a set size rather than a maximum because the `.gguf` files now have the training context size in metadata. We set it to 8192 to be sensible.
91+
```
92+
cat << EOF > $DEFAULT_ARGS
93+
-m
94+
model.gguf
95+
--host
96+
127.0.0.1
97+
--port
98+
8080
99+
--ctx-size
100+
8192
101+
...
102+
EOF
103+
printf "\n**********\n*\n* FINISHED: Create Default args File.\n*\n**********\n\n"
104+
```
105+
106+
#### OPTIONAL: Create default-args File with Website
107+
108+
If you added a website to the archive, use this instead:
109+
```
110+
cat << EOF > $DEFAULT_ARGS
111+
-m
112+
model.gguf
113+
--host
114+
127.0.0.1
115+
--port
116+
8080
117+
--ctx-size
118+
8192
119+
--path
120+
/zip/website
121+
...
122+
EOF
123+
printf "\n**********\n*\n* FINISHED: Create Default args File with Website.\n*\n**********\n\n"
124+
```
125+
126+
---
127+
### Add default-args File to Archive
128+
129+
Add the `default-args` file to the archive:
130+
```
131+
zip -0 -r $LLAMA_SERVER_ONE_ZIP $DEFAULT_ARGS
132+
printf "\n**********\n*\n* FINISHED: Add default-args File to Archive.\n*\n**********\n\n"
133+
```
134+
135+
---
136+
### Verify default-args File in Archive
137+
138+
Verify that the archive contains the `default-args` file:
139+
```
140+
unzip -l $LLAMA_SERVER_ONE_ZIP
141+
printf "\n**********\n*\n* FINISHED: Verify default-args File in Archive.\n*\n**********\n\n"
142+
```
143+
144+
---
145+
### Remove .zip Extension
146+
147+
Remove the `.zip` from our working file:
148+
```
149+
mv $LLAMA_SERVER_ONE_ZIP $LLAMA_SERVER_ONE
150+
printf "\n**********\n*\n* FINISHED: Remove .zip Extension.\n*\n**********\n\n"
151+
```
152+
153+
---
154+
### Download Model
155+
156+
Let's download a small model. We'll use Google Gemma 1B Instruct v3, a surprisingly capable tiny model.
157+
```
158+
MODEL_FILE="Google-Gemma-1B-Instruct-v3-q8_0.gguf"
159+
wget https://huggingface.co/bradhutchings/Brads-LLMs/resolve/main/models/$MODEL_FILE?download=true \
160+
--show-progress --quiet -O model.gguf
161+
printf "\n**********\n*\n* FINISHED: Download Model.\n*\n**********\n\n"
162+
```
163+
164+
---
165+
### Test Run
166+
167+
Now we can test run `llama-server-one`, listening on localhost:8080.
168+
```
169+
./$LLAMA_SERVER_ONE
170+
```
171+
172+
After starting up and loading the model, it should display:
173+
174+
**main: server is listening on http://127.0.0.1:8080 - starting the main loop**<br/>
175+
**srv update_slots: all slots are idle**
176+
177+
Hit `ctrl-C` on your keyboard to stop it.
178+
179+
---
180+
### Test Run on Public Interfaces
181+
182+
If you'd like it to listen on all available interfaces, so you can connect from a browser on another computer:
183+
```
184+
./$LLAMA_SERVER_ONE --host 0.0.0.0
185+
```
186+
187+
After starting up and loading the model, it should display:
188+
189+
**main: server is listening on http://0.0.0.0:8080 - starting the main loop**<br/>
190+
**srv update_slots: all slots are idle**
191+
192+
Hit `ctrl-C` on your keyboard to stop it.
193+
194+
---
195+
Congratulations! You are ready to package your `llams-server-one` executable for deployment. Follow instructions in [Packaging-ls1.md](Packaging-ls1.md).

0 commit comments

Comments
 (0)