ggml-org · BradHutchings · Mar 31, 2025 · Mar 31, 2025 · Mar 31, 2025 · Mar 31, 2025
diff --git a/Makefile b/Makefile
diff --git a/Makefile-llama-cpp b/Makefile-llama-cpp
diff --git a/README-LS1.md b/README-LS1.md
@@ -0,0 +1,78 @@
+## llama-server-one
+Based on [llama.cpp](https://github.com/ggml-org/llama.cpp).
+
+Brad Hutchings<br/>
+[email protected]
+
+<!--
+**THIS REPO IS NOT QUITE READY FOR PUBIC USE. I WILL REMOVE THIS NOTICE WHEN IT IS READY.**
+-->
+
+---
+### Project Goals
+
+The goal of this project is to build a single `llama-server-one executable` file that can run "anywhere":
+- x86_64 Windows
+- x86_64 Linux
+- ARM Windows
+- ARM Linux
+- ARM MacOS
+
+I am inspired by the [llamafile project](https://github.com/Mozilla-Ocho/llamafile). The main drawback of that project is that it has not kept up-to-date with llama.cpp and therefore, does not always support the latest models when llama.cpp supports them. Support for new models in llamafile takes work and time.
+
+I want to use the MIT license as used by llama.cpp.
+
+GPU support is not important to me and can be handled by platform specific builds of llama.cpp. CPU inference is quite adequate for many private end-user applications.
+
+The ability to package support files, such as a custom web, UI into the executable file is important to me. This is implemented.
+
+The ability to package default arguments, in an "args" file, into the executable file is important to me. This is implemented.
+
+The ability to read arguments from a file adjacent to the executable file is important to me. This is implemented.
+
+The ability to package a gguf model into the executable file is important to me. This is not implemented yet.
+
+I welcome any of my changes being implemented in the official llama.cpp.
+
+---
+### Documentation
+Follow these guides in order to build, package, and deploy `llama-server-one`:
+- My start-to-finish guide for building `llama-server` with Cosmo is in the [Building-ls1.md](docs/Building-ls1.md) file.
+- My guide for configuring a `llama-server-one` executable is in the [Configuring-ls1.md](docs/Configuring-ls1.md) file.
+- My guide for packaging a `llama-server-one` executable for deployment is in the [Packaging-ls1.md](docs/Packaging-ls1.md) file.
+
+---
+### Modifications to llama.cpp
+
+To get this from the llama.cpp source base, there are few files that need to be modified:
+
+1. [Makefile](Makefile) -- Extensive modifications to bring up to date, as it is deprecated in favor of a CMake system, and to support COSMOCC.
+
+2. [src/llama-context.cpp](src/llama-context.cpp) -- COSMOCC doesn't have std::fill in its Standard Templates Library.
+
+3. [examples/server/server.cpp](examples/server/server.cpp) -- Support embedded or adjacent "args" file, fix Cosmo name conflict with "defer" task member, add additional meta data to `model_meta`.
+
+---
+### Reference
+
+Here are some projects and pages you should be familiar with if you want to get the most out of `llama-server-one`:
+- [llama.cpp](https://github.com/ggml-org/llama.cpp) - Georgi Gerganov and his team are the rock stars who are making the plumbing so LLMs can be available for developers of all kinds. The `llama.cpp` project is the industry standard for inference. I only fork it here because I want to make it a little better for my applications while preserving all its goodness.
+- [llamafile](https://github.com/Mozilla-Ocho/llamafile) - `Llamafile` lets you distribute and run LLMs with a single file. It is a Mozilla Foundation project that brough the Cosmopolitan C Library and llama.cpp together. It has some popular GPU support. It is based on an older version of llama.cpp and does not support all of the latest models supported by llama.cpp. Llamafile is an inspiration for this project.
+- [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) - `Cosmopolitan` is a project for building cross-platform binaries that run on x86_64 and ARM architectures, supporting Linux, Windows, macOS, and other operating systems. Like `llamafile`, I use Cosmo compile cross-platform executables of `llama.cpp` targets, including `llama-server`.
+- [Actually Portable Executable (APE) Specification](https://github.com/jart/cosmopolitan/blob/master/ape/specification.md) - Within the Cosmopolitan Libc repo is documentation about how the cross CPU, cross platform executable works.
+- [Brad's LLMs](https://huggingface.co/bradhutchings/Brads-LLMs) - I share private local LLMs built with `llamafile` in a Hugging Face repo.
+
+---
+### To Do List
+
+In no particular order of importance, these are the things that bother me:
+- Package gguf file into executable file. The zip item needs to be aligned for mmap. There is a zipalign.c tool source in llamafile that seems loosely inspired by the Android zipalign too. I feel like there should be a more generic solution for this problem.
+- GPU support without a complicated kludge, and that can support all supported platform / CPU / GPU triads. Perhaps a plugin system with shared library dispatch? Invoking dev tools on Apple Metal like llamafile does is "complicated".
+- Code signing instructions. Might have to sign executables within the zip package, plus the package itself.
+- Clean up remaining build warnings, either by fixing source (i.e. Cosmo) or finding the magical compiler flags.
+- Copy the `cosmo_args` function into `server.cpp` so it could potentially be incorporated upstream in non-Cosmo builds. `common/arg2.cpp` might be a good landing spot. License in [Cosmo source code](https://github.com/jart/cosmopolitan/blob/master/tool/args/args2.c) appears to be MIT compatible with attribution.
+  - The args thing is cute, but it might be easier as a yaml file. Key value pairs. Flags can be keys with null values.
+- The `--ctx-size` parameter doesn't seem quite right given that new models have the training (or max) context size in their metadata. That size should be used subject to a maximum in a passed parameter. E.g. So a 128K model can run comfortably on a smaller device.
+- Write docs for a Deploying step. It should address the args file, removing the extra executable depending on platform, models, host, port. context size.
+- ~~Make a `.gitattributes` file so we can set the default file to be displayed and keep the README.md from llama.cpp. This will help in syncing changes continually from upstream. Reference: https://git-scm.com/docs/gitattributes~~ -- This doesn't actually work.
+- Cosmo needs libssl and libcrypto. Building these from scratch gets an error about Cosco not liking assembly files. Sort this out.
@@ -840,6 +840,21 @@ std::string fs_get_cache_directory() {
         cache_directory = std::getenv("HOME") + std::string("/Library/Caches/");
 #elif defined(_WIN32)
         cache_directory = std::getenv("LOCALAPPDATA");
+
+// llama-server-one START
+#elif defined(COSMOCC)
+        // We don't know what OS we are running on at compile time, just CPU architecture.
+        // try various environment variables, fall back to ~/.cache.
+        // FUTURE: Checkj if the directories actually exist.
+        cache_directory = std::getenv("LOCALAPPDATA");
+        if (cache_directory == "") {
+            cache_directory = std::getenv("XDG_CACHE_HOME");
+        }
+        if (cache_directory == "") {
+            cache_directory = std::getenv("HOME") + std::string("/.cache/");
+        }
+
+// llama-server-one END        
 #else
 #  error Unknown architecture
 #endif

diff --git a/docs/Buidling-ls1-Brads-Env.md b/docs/Buidling-ls1-Brads-Env.md
@@ -0,0 +1,168 @@
+## Building llama-server
+
+Brad Hutchings<br/>
+[email protected]
+
+This file contains instructions for building `llama.cpp` with `cosmocc` to yield a `llama-server` executable that will run on multiple platforms.
+
+### Environment Variables
+
+Let's define some environment variables, resetting those that affect the Makefile:
+```
+BUILDING_DIR="1-BUILDING-llama.cpp"
+unset CC
+unset CXX
+unset AR
+unset UNAME_S
+unset UNAME_P
+unset UNAME_M
+printf "\n**********\n*\n* FINISHED: Environment Variables.\n*\n**********\n\n"
+```
+
+_Note that if you copy each code block from the guide and paste it into your terminal, each block ends with a message so you won't lose your place in this guide._
+
+---
+### Build Dependencies
+I build with a freshly installed Ubuntu 24.04 VM. Here are some packages that are helpful in creating a working build system. You may need to install more.
+```
+sudo apt install -y git python3-pip build-essential zlib1g-dev \
+    libffi-dev libssl-dev libbz2-dev libreadline-dev libsqlite3-dev \
+    liblzma-dev tk-dev python3-tk cmake zip npm
+printf "\n**********\n*\n* FINISHED: Build Dependencies.\n*\n**********\n\n"
+```
+
+---
+### Clone this Repo Locally
+Clone this repo into a `~\llama.cpp` directory.
+```
+cd ~
+git clone https://github.com/BradHutchings/llama-server-one.git $BUILDING_DIR
+printf "\n**********\n*\n* FINISHED: Clone this Repo Locally.\n*\n**********\n\n"
+```
+
+**Optional:** Use the `work-in-progress` branch where I implement and test my own changes and where I test upstream changes from `llama.cpp`.
+```
+cd ~/$BUILDING_DIR
+git checkout work-in-progress
+printf "\n**********\n*\n* FINISHED: Checkout work-in-progress.\n*\n**********\n\n"
+```
+
+---
+### Customize WebUI
+```
+APP_NAME='Mmojo Chat'
+sed -i -e "s/<title>.*<\/title>/<title>$APP_NAME<\/title>/g" examples/server/webui/index.html
+sed -i -e "s/>llama.cpp<\/div>/>$APP_NAME<\/div>/g" examples/server/webui/src/components/Header.tsx
+cd examples/server/webui
+npm i
+npm run build
+cd ~/$BUILDING_DIR
+printf "\n**********\n*\n* FINISHED: Customize WebUI.\n*\n**********\n\n"
+```
+
+---
+### Make llama.cpp
+We use the old `Makefile` rather than CMake. We've updated the `Makefile` in this repo to build llama.cpp correctly.
+```
+cd ~/$BUILDING_DIR
+export LLAMA_MAKEFILE=1
+export LLAMA_SERVER_SSL=ON
+make clean
+make
+printf "\n**********\n*\n* FINISHED: Make llama.cpp.\n*\n**********\n\n"
+```
+
+If the build is successful, it will end with this message:
+
+&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**
+
+If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.
+
+If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.
+
+#### List Directory
+
+At this point, you should see `llama-server` and other built binaries in the directory listing.
+```
+ls -al
+printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
+```
+
+---
+### Install Cosmo
+```
+mkdir -p cosmocc
+cd cosmocc
+wget https://cosmo.zip/pub/cosmocc/cosmocc.zip
+unzip cosmocc.zip
+rm cosmocc.zip
+cd ..
+printf "\n**********\n*\n* FINISHED: Install Cosmo.\n*\n**********\n\n"
+```
+
+---
+### Prepare to make llama.cpp with Cosmo
+```
+export PATH="$(pwd)/cosmocc/bin:$PATH"
+export CC="cosmocc -I$(pwd)/cosmocc/include -L$(pwd)/cosmocc/lib"
+export CXX="cosmocc -I$(pwd)/cosmocc/include \
+    -I$(pwd)/cosmocc/include/third_party/libcxx \
+    -L$(pwd)/cosmocc/lib -L$(pwd)/openssl"
+export AR="cosmoar"
+export UNAME_S="cosmocc"
+export UNAME_P="cosmocc"
+export UNAME_M="cosmocc"
+printf "\n**********\n*\n* FINISHED: Prepare to make llama.cpp with Cosmo.\n*\n**********\n\n"
+```
+
+---
+### Make openssl with Cosmo
+We need cross-architectire `libssl` and `libcrypto` static libraries to support SSL in `llama-server-one`.
+```
+cp -r /usr/include/openssl/ ./cosmocc/include/
+cp -r /usr/include/x86_64-linux-gnu/openssl/* ./cosmocc/include/openssl
+git clone https://github.com/openssl/openssl.git
+cd openssl
+./Configure no-asm no-dso no-afalgeng no-shared no-pinshared no-apps
+make
+cd ..
+printf "\n**********\n*\n* FINISHED: Make openssl with Cosmo.\n*\n**********\n\n"
+
+```
+
+---
+### Make llama.cpp with Cosmo
+```
+make clean
+make
+printf "\n**********\n*\n* FINISHED: Make llama.cpp with Cosmo\n*\n**********\n\n"
+```
+
+If the build is successful, it will end with this message:
+
+&nbsp;&nbsp;&nbsp;&nbsp;**NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.**
+
+If the build fails and you've checked out the `work-in-progress` branch, well, it's in progess, so switch back to the `master` branch and build that.
+
+If the build fails on the `master` branch, please post a note in the [Discussions](https://github.com/BradHutchings/llama-server-one/discussions) area.
+
+#### List Directory
+
+At this point, you should see `llama-server` and other built binaries in the directory listing.
+```
+ls -al
+printf "\n**********\n*\n* FINISHED: List Directory.\n*\n**********\n\n"
+```
+
+#### Verify Zip Archive
+
+`llama-server` is actually a zip acrhive with an "Actually Portable Executable" (APE) loader prefix. Let's verify the zip archive part:
+```
+unzip -l llama-server
+printf "\n**********\n*\n* FINISHED: Verify Zip Archive.\n*\n**********\n\n"
+```
+
+---
+### Configuring llama-server-one
+
+Now that you've built `llama-server`, you're ready to configure it as `llama-server-one`. Follow instructions in [Configuring-ls1-Brads-Env.md](Configuring-ls1-Brads-Env.md).