Skip to content

Commit 7939706

Browse files
committed
Create Configuring-ls1-Brads-Env.md
Signed-off-by: Brad Hutchings <[email protected]>
1 parent 6b2862b commit 7939706

File tree

1 file changed

+204
-0
lines changed

1 file changed

+204
-0
lines changed

docs/Configuring-ls1-Brads-Env.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
## Configuring llama-server-one in my Environment
2+
3+
Brad Hutchings<br/>
4+
5+
6+
This file contains instructions for configuring the `llama-server-one` executable to make it ready to package for multiple platforms.
7+
Instructioons have been customized for my environment. You should use these [Configuring Instructions](Configuring-ls1.md).
8+
9+
---
10+
### Environment Variables
11+
12+
Let's define some environment variables:
13+
```
14+
BUILDING_DIR="1-BUILDING-llama.cpp"
15+
CONFIGURING_DIR="2-CONFIGURING-llama-server-one"
16+
17+
LLAMA_SERVER="llama-server"
18+
LLAMA_SERVER_ONE="llama-server-one"
19+
LLAMA_SERVER_ONE_ZIP="llama-server-one.zip"
20+
DEFAULT_ARGS="default-args"
21+
printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
22+
```
23+
24+
---
25+
### Create Configuration Directory
26+
27+
Next, let's create a directory where we'll configure `llama-server-one`:
28+
```
29+
cd ~
30+
rm -r -f ~/$CONFIGURING_DIR
31+
mkdir -p $CONFIGURING_DIR
32+
cp ~/$BUILDING_DIR/$LLAMA_SERVER \
33+
~/$CONFIGURING_DIR/$LLAMA_SERVER_ONE_ZIP
34+
35+
cd ~/$CONFIGURING_DIR
36+
printf "\n**********\n*\n* FINISHED: Create Configuration Directory.\n*\n**********\n\n"
37+
```
38+
39+
---
40+
### Examine Contents of Zip Archive
41+
42+
Look at the contents of the `llama-server-one` zip archive:
43+
```
44+
unzip -l $LLAMA_SERVER_ONE_ZIP
45+
printf "\n**********\n*\n* FINISHED: Examine Contents of Zip Archive.\n*\n**********\n\n"
46+
```
47+
48+
---
49+
### Delete Extraneous Timezone Files
50+
51+
You should notice a bunch of extraneous timezone related files in `/usr/*`. Let's get rid of those:
52+
```
53+
zip -d $LLAMA_SERVER_ONE_ZIP "/usr/*"
54+
printf "\n**********\n*\n* FINISHED: Delete Extraneous Timezone Files.\n*\n**********\n\n"
55+
```
56+
57+
---
58+
### Verify Contents of Zip Archive
59+
60+
Verify that these files are no longer in the archive:
61+
```
62+
unzip -l $LLAMA_SERVER_ONE_ZIP
63+
printf "\n**********\n*\n* FINISHED: Verify Contents of Zip Archive.\n*\n**********\n\n"
64+
```
65+
66+
---
67+
### OPTIONAL: Create website Directory in Archive
68+
69+
`llama.cpp` has a built in chat UI. If you'd like to provide a custom UI, you should add a `website` directory to the `llama-server-one` archive. `llama.cpp`'s chat UI is optimized for serving inside the project's source code. But we can copy the unoptimized source:
70+
```
71+
mkdir website
72+
cp -r /mnt/hyperv/web-apps/completion-tool/* website
73+
rm website/*.txt
74+
rm website/images/*.svg
75+
rm website/images/*.psd
76+
zip -0 -r $LLAMA_SERVER_ONE_ZIP website/*
77+
printf "\n**********\n*\n* FINISHED: Create website Directory in Archive.\n*\n**********\n\n"
78+
```
79+
80+
#### OPTONAL: Verify website Directory in Archive
81+
82+
Verify that the archive has your website:
83+
```
84+
unzip -l $LLAMA_SERVER_ONE_ZIP
85+
printf "\n**********\n*\n* FINISHED: Verify website Directory in Archive.\n*\n**********\n\n"
86+
```
87+
---
88+
### Create default-args File
89+
90+
A `default-args` file in the archive can specify sane default parameters. The format of the file is parameter name on a line, parameter value on a line, rinse, repeat. End the file with a `...` line to include user specified parameters.
91+
92+
We don't yet support including the model inside the zip archive (yet). That has a 4GB size limitation on Windows anyway, as `.exe` files cannot exceed 4GB. So let's use an adjacent file called `model.gguf`.
93+
94+
We will serve on localhost, port 8080 by default for safety. The `--ctx-size` parameter is the size of the context window. This is kinda screwy to have as a set size rather than a maximum because the `.gguf` files now have the training context size in metadata. We set it to 8192 to be sensible.
95+
```
96+
cat << EOF > $DEFAULT_ARGS
97+
-m
98+
model.gguf
99+
--host
100+
127.0.0.1
101+
--port
102+
8080
103+
--ctx-size
104+
8192
105+
...
106+
EOF
107+
printf "\n**********\n*\n* FINISHED: Create Default args File.\n*\n**********\n\n"
108+
```
109+
110+
#### OPTIONAL: Create default-args File with Website
111+
112+
If you added a website to the archive, use this instead:
113+
```
114+
cat << EOF > $DEFAULT_ARGS
115+
-m
116+
model.gguf
117+
--host
118+
127.0.0.1
119+
--port
120+
8080
121+
--ctx-size
122+
8192
123+
--path
124+
/zip/website
125+
...
126+
EOF
127+
printf "\n**********\n*\n* FINISHED: Create Default args File with Website.\n*\n**********\n\n"
128+
```
129+
130+
---
131+
### Add default-args File to Archive
132+
133+
Add the `default-args` file to the archive:
134+
```
135+
zip -0 -r $LLAMA_SERVER_ONE_ZIP $DEFAULT_ARGS
136+
printf "\n**********\n*\n* FINISHED: Add default-args File to Archive.\n*\n**********\n\n"
137+
```
138+
139+
---
140+
### Verify default-args File in Archive
141+
142+
Verify that the archive contains the `default-args` file:
143+
```
144+
unzip -l $LLAMA_SERVER_ONE_ZIP
145+
printf "\n**********\n*\n* FINISHED: Verify default-args File in Archive.\n*\n**********\n\n"
146+
```
147+
148+
---
149+
### Remove .zip Extension
150+
151+
Remove the `.zip` from our working file:
152+
```
153+
mv $LLAMA_SERVER_ONE_ZIP $LLAMA_SERVER_ONE
154+
printf "\n**********\n*\n* FINISHED: Remove .zip Extension.\n*\n**********\n\n"
155+
```
156+
157+
---
158+
### Download Model
159+
160+
Let's download a small model. We'll use Google Gemma 1B Instruct v3, a surprisingly capable tiny model.
161+
```
162+
MODEL_FILE="Google-Gemma-1B-Instruct-v3-q8_0.gguf"
163+
cp /mnt/hyperv/$MODEL_FILE model.gguf
164+
printf "\n**********\n*\n* FINISHED: Download Model.\n*\n**********\n\n"
165+
```
166+
167+
---
168+
### Test Run
169+
170+
Now we can test run `llama-server-one`, listening on localhost:8080.
171+
```
172+
./$LLAMA_SERVER_ONE
173+
```
174+
175+
After starting up and loading the model, it should display:
176+
177+
**main: server is listening on http://127.0.0.1:8080 - starting the main loop**<br/>
178+
**srv update_slots: all slots are idle**
179+
180+
Hit `ctrl-C` on your keyboard to stop it.
181+
182+
---
183+
### Test Run on Public Interfaces
184+
185+
If you'd like it to listen on all available interfaces, so you can connect from a browser on another computer:
186+
```
187+
./$LLAMA_SERVER_ONE --host 0.0.0.0
188+
```
189+
190+
After starting up and loading the model, it should display:
191+
192+
**main: server is listening on http://0.0.0.0:8080 - starting the main loop**<br/>
193+
**srv update_slots: all slots are idle**
194+
195+
Hit `ctrl-C` on your keyboard to stop it.
196+
197+
---
198+
### Copy llama-server-one for Deployment
199+
Congratulations! You are ready to copy `llams-server-one` executable to the share for deployment.
200+
201+
```
202+
sudo cp llama-server-one /mnt/hyperv/Mmojo-Raspberry-Pi/Mmojo-LLMs
203+
printf "\n**********\n*\n* FINISHED: Copy llama-server-one for Deployment.\n*\n**********\n\n"
204+
```

0 commit comments

Comments
 (0)