Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
a1ebbe2
Add files via upload
BradHutchings Mar 31, 2025
d195986
Update server.cpp
BradHutchings Mar 31, 2025
71c6a03
Update llama-context.cpp
BradHutchings Mar 31, 2025
a4f2d58
Rename Makefile to Makefile-llama-cpp-original
BradHutchings Mar 31, 2025
5e9c36f
Rename README.md to README-llama.cpp.md
BradHutchings Mar 31, 2025
5661500
Add files via upload
BradHutchings Mar 31, 2025
30d11b3
Create Configuring-ls1-Brads-Env.md
BradHutchings Mar 31, 2025
1ac6ea9
Delete docs/Configuring-ls1-Brads-Env.md
BradHutchings Mar 31, 2025
6b2862b
Update server.cpp
BradHutchings Mar 31, 2025
7939706
Create Configuring-ls1-Brads-Env.md
BradHutchings Mar 31, 2025
04b2310
Update Configuring-ls1-Brads-Env.md
BradHutchings Mar 31, 2025
33953bb
Update server.cpp
BradHutchings Mar 31, 2025
9a6cb63
Merge pull request #17 from BradHutchings/work-in-progress
BradHutchings Mar 31, 2025
2cab571
Update Configuring-ls1-Brads-Env.md
BradHutchings Mar 31, 2025
83c8bdd
Update server.cpp
BradHutchings Mar 31, 2025
d33545c
Merge pull request #18 from BradHutchings/work-in-progress
BradHutchings Mar 31, 2025
9713c93
Merge pull request #19 from ggml-org/master
BradHutchings Apr 3, 2025
bc20658
Merge pull request #20 from BradHutchings/work-in-progress
BradHutchings Apr 3, 2025
35f9c7d
Update and rename README.md to README-LS1.md
BradHutchings Apr 4, 2025
5f96cec
Update and rename README-LS1.md to README.md
BradHutchings Apr 4, 2025
aa907ed
Update and rename README.md to README-LS1.md
BradHutchings Apr 4, 2025
ab621d2
Rename README-llama.cpp.md to README.md
BradHutchings Apr 4, 2025
da3a933
Merge pull request #21 from ggml-org/master
BradHutchings Apr 4, 2025
db1564e
Rename README.md to README-llama.cpp.md
BradHutchings Apr 4, 2025
f2a4d28
Update and rename README-LS1.md to README.md
BradHutchings Apr 4, 2025
74293b6
Merge pull request #22 from BradHutchings/work-in-progress
BradHutchings Apr 4, 2025
582dcfc
Update README.md
BradHutchings Apr 4, 2025
2cb14df
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 4, 2025
b36532c
Update README.md
BradHutchings Apr 4, 2025
277d978
Merge pull request #23 from BradHutchings/work-in-progress
BradHutchings Apr 4, 2025
c0dfba3
Update and rename Makefile to Makefile-LS1
BradHutchings Apr 16, 2025
871a988
Rename Makefile-llama-cpp-original to Makefile
BradHutchings Apr 16, 2025
0899a81
Update and rename README.md to README-LS1.md
BradHutchings Apr 16, 2025
7e2be1b
Rename README-llama.cpp.md to README.md
BradHutchings Apr 16, 2025
0668059
Merge pull request #24 from ggml-org/master
BradHutchings Apr 16, 2025
a217dab
Rename README.md to README-llama-cpp.md
BradHutchings Apr 16, 2025
94c5915
Update and rename README-LS1.md to README.md
BradHutchings Apr 16, 2025
e9d662a
Rename Makefile to Makefile-llama-cpp
BradHutchings Apr 16, 2025
6a44b2a
Update and rename Makefile-LS1 to Makefile
BradHutchings Apr 16, 2025
6fe0e59
Merge pull request #25 from BradHutchings/work-in-progress
BradHutchings Apr 16, 2025
59707a8
Update Makefile
BradHutchings Apr 16, 2025
9b230e3
Update Makefile
BradHutchings Apr 16, 2025
c543fd0
Update common.cpp
BradHutchings Apr 16, 2025
a99d999
Merge pull request #26 from BradHutchings/work-in-progress
BradHutchings Apr 16, 2025
168534b
Merge pull request #27 from ggml-org/master
BradHutchings Apr 16, 2025
65dbe5d
Merge pull request #28 from BradHutchings/work-in-progress
BradHutchings Apr 16, 2025
68355a5
Create index.md
BradHutchings Apr 16, 2025
5a54935
Update and rename index.md to .gitattributes
BradHutchings Apr 16, 2025
92aa15e
Delete .gitattributes
BradHutchings Apr 16, 2025
d5b9fb0
Merge pull request #29 from ggml-org/master
BradHutchings Apr 17, 2025
30ff512
Merge pull request #30 from BradHutchings/work-in-progress
BradHutchings Apr 17, 2025
6afeac8
Update Building-ls1.md
BradHutchings Apr 18, 2025
0442ee4
Create Buidling-ls1-Brads-Env.md
BradHutchings Apr 18, 2025
c13ceb3
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 18, 2025
4f23a56
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 18, 2025
f557dc5
Rename server.cpp to server-ls1.cpp
BradHutchings Apr 19, 2025
a7cea41
Create server.cpp
BradHutchings Apr 19, 2025
3b8f05a
Update server.cpp
BradHutchings Apr 19, 2025
dbb8991
Merge pull request #31 from ggml-org/master
BradHutchings Apr 19, 2025
5b92c34
Update server.cpp
BradHutchings Apr 19, 2025
9cfcb36
Update server-ls1.cpp
BradHutchings Apr 19, 2025
ae5e913
Merge pull request #32 from BradHutchings/work-in-progress
BradHutchings Apr 19, 2025
b963568
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 19, 2025
77bb344
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
0210119
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
e9043db
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
36c3fb1
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
a1f1d3b
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
09f5c03
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
4fbe49d
Update README.md
BradHutchings Apr 20, 2025
c44f98f
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
998d354
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
c79ab10
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
9e0a1c3
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
0723a9e
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
8d50e9c
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
4367c5b
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
604e07e
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
8c3ffae
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
444c4ad
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
ddeaede
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
93c9dc4
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
783d0ec
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
b1f1b14
Update Buidling-ls1-Brads-Env.md
BradHutchings Apr 20, 2025
51e9d7c
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 21, 2025
f97b8cc
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 21, 2025
321574b
Update Configuring-ls1-Brads-Env.md
BradHutchings Apr 21, 2025
c2deea2
Merge pull request #33 from ggml-org/master
BradHutchings Apr 21, 2025
43bc5f1
Merge pull request #34 from BradHutchings/work-in-progress
BradHutchings Apr 21, 2025
c253e9f
Update and rename README.md to README-LS1.md
BradHutchings Apr 21, 2025
dfe634d
Rename README-llama-cpp.md to README.md
BradHutchings Apr 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3,320 changes: 1,707 additions & 1,613 deletions Makefile

Large diffs are not rendered by default.

1,613 changes: 1,613 additions & 0 deletions Makefile-llama-cpp

Large diffs are not rendered by default.

78 changes: 78 additions & 0 deletions README-LS1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
## llama-server-one
Based on [llama.cpp](https://github.com/ggml-org/llama.cpp).

Brad Hutchings<br/>
[email protected]

<!--
**THIS REPO IS NOT QUITE READY FOR PUBIC USE. I WILL REMOVE THIS NOTICE WHEN IT IS READY.**
-->

---
### Project Goals

The goal of this project is to build a single `llama-server-one executable` file that can run "anywhere":
- x86_64 Windows
- x86_64 Linux
- ARM Windows
- ARM Linux
- ARM MacOS

I am inspired by the [llamafile project](https://github.com/Mozilla-Ocho/llamafile). The main drawback of that project is that it has not kept up-to-date with llama.cpp and therefore, does not always support the latest models when llama.cpp supports them. Support for new models in llamafile takes work and time.

I want to use the MIT license as used by llama.cpp.

GPU support is not important to me and can be handled by platform specific builds of llama.cpp. CPU inference is quite adequate for many private end-user applications.

The ability to package support files, such as a custom web, UI into the executable file is important to me. This is implemented.

The ability to package default arguments, in an "args" file, into the executable file is important to me. This is implemented.

The ability to read arguments from a file adjacent to the executable file is important to me. This is implemented.

The ability to package a gguf model into the executable file is important to me. This is not implemented yet.

I welcome any of my changes being implemented in the official llama.cpp.

---
### Documentation
Follow these guides in order to build, package, and deploy `llama-server-one`:
- My start-to-finish guide for building `llama-server` with Cosmo is in the [Building-ls1.md](docs/Building-ls1.md) file.
- My guide for configuring a `llama-server-one` executable is in the [Configuring-ls1.md](docs/Configuring-ls1.md) file.
- My guide for packaging a `llama-server-one` executable for deployment is in the [Packaging-ls1.md](docs/Packaging-ls1.md) file.

---
### Modifications to llama.cpp

To get this from the llama.cpp source base, there are few files that need to be modified:

1. [Makefile](Makefile) -- Extensive modifications to bring up to date, as it is deprecated in favor of a CMake system, and to support COSMOCC.

2. [src/llama-context.cpp](src/llama-context.cpp) -- COSMOCC doesn't have std::fill in its Standard Templates Library.

3. [examples/server/server.cpp](examples/server/server.cpp) -- Support embedded or adjacent "args" file, fix Cosmo name conflict with "defer" task member, add additional meta data to `model_meta`.

---
### Reference

Here are some projects and pages you should be familiar with if you want to get the most out of `llama-server-one`:
- [llama.cpp](https://github.com/ggml-org/llama.cpp) - Georgi Gerganov and his team are the rock stars who are making the plumbing so LLMs can be available for developers of all kinds. The `llama.cpp` project is the industry standard for inference. I only fork it here because I want to make it a little better for my applications while preserving all its goodness.
- [llamafile](https://github.com/Mozilla-Ocho/llamafile) - `Llamafile` lets you distribute and run LLMs with a single file. It is a Mozilla Foundation project that brough the Cosmopolitan C Library and llama.cpp together. It has some popular GPU support. It is based on an older version of llama.cpp and does not support all of the latest models supported by llama.cpp. Llamafile is an inspiration for this project.
- [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) - `Cosmopolitan` is a project for building cross-platform binaries that run on x86_64 and ARM architectures, supporting Linux, Windows, macOS, and other operating systems. Like `llamafile`, I use Cosmo compile cross-platform executables of `llama.cpp` targets, including `llama-server`.
- [Actually Portable Executable (APE) Specification](https://github.com/jart/cosmopolitan/blob/master/ape/specification.md) - Within the Cosmopolitan Libc repo is documentation about how the cross CPU, cross platform executable works.
- [Brad's LLMs](https://huggingface.co/bradhutchings/Brads-LLMs) - I share private local LLMs built with `llamafile` in a Hugging Face repo.

---
### To Do List

In no particular order of importance, these are the things that bother me:
- Package gguf file into executable file. The zip item needs to be aligned for mmap. There is a zipalign.c tool source in llamafile that seems loosely inspired by the Android zipalign too. I feel like there should be a more generic solution for this problem.
- GPU support without a complicated kludge, and that can support all supported platform / CPU / GPU triads. Perhaps a plugin system with shared library dispatch? Invoking dev tools on Apple Metal like llamafile does is "complicated".
- Code signing instructions. Might have to sign executables within the zip package, plus the package itself.
- Clean up remaining build warnings, either by fixing source (i.e. Cosmo) or finding the magical compiler flags.
- Copy the `cosmo_args` function into `server.cpp` so it could potentially be incorporated upstream in non-Cosmo builds. `common/arg2.cpp` might be a good landing spot. License in [Cosmo source code](https://github.com/jart/cosmopolitan/blob/master/tool/args/args2.c) appears to be MIT compatible with attribution.
- The args thing is cute, but it might be easier as a yaml file. Key value pairs. Flags can be keys with null values.
- The `--ctx-size` parameter doesn't seem quite right given that new models have the training (or max) context size in their metadata. That size should be used subject to a maximum in a passed parameter. E.g. So a 128K model can run comfortably on a smaller device.
- Write docs for a Deploying step. It should address the args file, removing the extra executable depending on platform, models, host, port. context size.
- ~~Make a `.gitattributes` file so we can set the default file to be displayed and keep the README.md from llama.cpp. This will help in syncing changes continually from upstream. Reference: https://git-scm.com/docs/gitattributes~~ -- This doesn't actually work.
- Cosmo needs libssl and libcrypto. Building these from scratch gets an error about Cosco not liking assembly files. Sort this out.
15 changes: 15 additions & 0 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -840,6 +840,21 @@ std::string fs_get_cache_directory() {
cache_directory = std::getenv("HOME") + std::string("/Library/Caches/");
#elif defined(_WIN32)
cache_directory = std::getenv("LOCALAPPDATA");

// llama-server-one START
#elif defined(COSMOCC)
// We don't know what OS we are running on at compile time, just CPU architecture.
// try various environment variables, fall back to ~/.cache.
// FUTURE: Checkj if the directories actually exist.
cache_directory = std::getenv("LOCALAPPDATA");
if (cache_directory == "") {
cache_directory = std::getenv("XDG_CACHE_HOME");
}
if (cache_directory == "") {
cache_directory = std::getenv("HOME") + std::string("/.cache/");
}

// llama-server-one END
#else
# error Unknown architecture
#endif
Expand Down
168 changes: 168 additions & 0 deletions docs/Buidling-ls1-Brads-Env.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
## Building llama-server

Brad Hutchings<br/>
[email protected]

This file contains instructions for building `llama.cpp` with `cosmocc` to yield a `llama-server` executable that will run on multiple platforms.

### Environment Variables

Let's define some environment variables, resetting those that affect the Makefile:
```
BUILDING_DIR="1-BUILDING-llama.cpp"
unset CC
unset CXX
unset AR
unset UNAME_S
unset UNAME_P
unset UNAME_M
printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
```

_Note that if you copy each code block from the guide and paste it into your terminal, each block ends with a message so you won't lose your place in this guide._

---
### Build Dependencies
I build with a freshly installed Ubuntu 24.04 VM. Here are some packages that are helpful in creating a working build system. You may need to install more.
```
sudo apt install -y git python3-pip build-essential zlib1g-dev \
libffi-dev libssl-dev libbz2-dev libreadline-dev libsqlite3-dev \
liblzma-dev tk-dev python3-tk cmake zip npm
printf "\n**********\n*\n* FINISHED: Build Dependencies.\n*\n**********\n\n"
```

---
### Clone this Repo Locally
Clone this repo into a `~\llama.cpp` directory.
```
cd ~
git clone https://github.com/BradHutchings/llama-server-one.git $BUILDING_DIR
printf "\n**********\n*\n* FINISHED: Clone this Repo Locally.\n*\n**********\n\n"
```

**Optional:** Use the `work-in-progress` branch where I implement and test my own changes and where I test upstream changes from `llama.cpp`.
```
cd ~/$BUILDING_DIR
git checkout work-in-progress
printf "\n**********\n*\n* FINISHED: Checkout work-in-progress.\n*\n**********\n\n"
```

---
### Customize WebUI
```
APP_NAME='Mmojo Chat'
sed -i -e "s/<title>.*<\/title>/<title>$APP_NAME<\/title>/g" examples/server/webui/index.html
sed -i -e "s/>llama.cpp<\/div>/>$APP_NAME<\/div>/g" examples/server/webui/src/components/Header.tsx
cd examples/server/webui
npm i
npm run build
cd ~/$BUILDING_DIR
printf "\n**********\n*\n* FINISHED: Customize WebUI.\n*\n**********\n\n"
```

---
### Make llama.cpp
We use the old `Makefile` rather than CMake. We've updated the `Makefile` in this repo to build llama.cpp correctly.
```
cd ~/$BUILDING_DIR
export LLAMA_MAKEFILE=1
export LLAMA_SERVER_SSL=ON
make clean
make
printf "\n**********\n*\n* FINISHED: Make llama.cpp.\n*\n**********\n\n"
```

If the build is successful, it will end with this message:

&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**

If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.

If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.

#### List Directory

At this point, you should see `llama-server` and other built binaries in the directory listing.
```
ls -al
printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
```

---
### Install Cosmo
```
mkdir -p cosmocc
cd cosmocc
wget https://cosmo.zip/pub/cosmocc/cosmocc.zip
unzip cosmocc.zip
rm cosmocc.zip
cd ..
printf "\n**********\n*\n* FINISHED: Install Cosmo.\n*\n**********\n\n"
```

---
### Prepare to make llama.cpp with Cosmo
```
export PATH="$(pwd)/cosmocc/bin:$PATH"
export CC="cosmocc -I$(pwd)/cosmocc/include -L$(pwd)/cosmocc/lib"
export CXX="cosmocc -I$(pwd)/cosmocc/include \
-I$(pwd)/cosmocc/include/third_party/libcxx \
-L$(pwd)/cosmocc/lib -L$(pwd)/openssl"
export AR="cosmoar"
export UNAME_S="cosmocc"
export UNAME_P="cosmocc"
export UNAME_M="cosmocc"
printf "\n**********\n*\n* FINISHED: Prepare to make llama.cpp with Cosmo.\n*\n**********\n\n"
```

---
### Make openssl with Cosmo
We need cross-architectire `libssl` and `libcrypto` static libraries to support SSL in `llama-server-one`.
```
cp -r /usr/include/openssl/ ./cosmocc/include/
cp -r /usr/include/x86_64-linux-gnu/openssl/* ./cosmocc/include/openssl
git clone https://github.com/openssl/openssl.git
cd openssl
./Configure no-asm no-dso no-afalgeng no-shared no-pinshared no-apps
make
cd ..
printf "\n**********\n*\n* FINISHED: Make openssl with Cosmo.\n*\n**********\n\n"

```

---
### Make llama.cpp with Cosmo
```
make clean
make
printf "\n**********\n*\n* FINISHED: Make llama.cpp with Cosmo\n*\n**********\n\n"
```

If the build is successful, it will end with this message:

&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**

If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.

If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.

#### List Directory

At this point, you should see `llama-server` and other built binaries in the directory listing.
```
ls -al
printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
```

#### Verify Zip Archive

`llama-server` is actually a zip acrhive with an "Actually Portable Executable" (APE) loader prefix. Let's verify the zip archive part:
```
unzip -l llama-server
printf "\n**********\n*\n* FINISHED: Verify Zip Archive.\n*\n**********\n\n"
```

---
### Configuring llama-server-one

Now that you've built `llama-server`, you're ready to configure it as `llama-server-one`. Follow instructions in [Configuring-ls1-Brads-Env.md](Configuring-ls1-Brads-Env.md).
Loading