We want to hear from you! #809

njbrake · 2025-10-29T19:20:49Z

njbrake
Oct 29, 2025
Maintainer

Why did you choose llamafile in the first place?
What features do you rely on most?
Why are you still using it? (Or, perhaps more tellingly, why did you move to another tool?)
What would make llamafile more useful for your work?

ricardosantos79 · 2025-10-29T22:58:28Z

ricardosantos79
Oct 29, 2025

lightweight terminal interaction, just put the executable in the same directory as the gguf/models and its ready to use!
lightweight terminal chat
new models compatibility

ability to load system prompt from a file.
ability save/load chat history into a file.

0 replies

DarklinuxFr · 2025-10-30T05:55:55Z

DarklinuxFr
Oct 30, 2025

It's a standard.
Internal security management, in compliance with ISO/NIS2/SBOM.
I wouldn't mind a multilingual GUI manager.

0 replies

kaiwalyajoshi · 2025-10-31T21:24:41Z

kaiwalyajoshi
Oct 31, 2025

We were interested in Llamafile due to the improvements it offered with CPU only inferencing.

It's still not that easy to find GPUs and you'd have to deal with various licensing issues with a well known GPU provider.

As Llamafile upstreamed its improvements to Llama.cpp, we started using Llama.cpp instead as activity had died down here.

Llamafile is still much easier to deploy and use and we're happy to use and contribute what we can here.

1 reply

aittalam Nov 4, 2025
Maintainer

Thank you! Out of curiosity re: CPU performance and improvements wrt Llama.cpp, did you try https://github.com/ikawrakow/ik_llama.cpp ?

rmusser01 · 2025-10-31T22:44:28Z

rmusser01
Oct 31, 2025

Single file inference that's cross-platform
Cosmopolitan wrapper; Multi-platform single binary llama.cpp
Single file inference that's cross-platform; It's used as a 'simple' choice in https://github.com/rmusser01/tldw_server as a backend inference option.
Continuing to act as a single binary wrapper for llama-server. The ability to rely on a single binary for inference without having to worry about platform specifics for non-technical users is huge

0 replies

RobViren · 2025-11-01T00:57:29Z

RobViren
Nov 1, 2025

I like it for use cases like games where I want to use LLMs. It allows me to distribute without needing to know hardly anything about the environment where it is being deployed.

0 replies

si-open · 2025-11-01T17:02:50Z

si-open
Nov 1, 2025

Please make program or hacking echo for assistent.
I need only few options:

give me a cook idea from ingredients
what is thime and setup time (for example tell me after 2 day about my meeting)
count (for example what is 2+7 or 21 day after today)
send my text to file, remember my text, send message to other person, agent, etc.

all offline, all on my local computer/device (like a mycroft). all in my language

0 replies

codesoap · 2025-11-08T15:13:04Z

codesoap
Nov 8, 2025

Ease of use, especially on a work computer, where I have to use Windows and don't wan't to bother building llama.cpp or similar for that OS.
Where possible, I use the CLI, but at work I rely on the web UI, because I can access it inside a (Linux) virtual machine, when it runs on the (Window) host (for performance).
I mostly use LLM web services, but for privacy I sometimes use llamafiles.
If llamafiles were to offer an API that can be used with lsp-ai, I could easily use a local LLM in my code editor. OpenAI, Anthropic and Mistral FIM compatible APIs are currently supported there.

0 replies

certainlynotpomegranates · 2025-11-10T03:05:10Z

certainlynotpomegranates
Nov 10, 2025

It's simple and portable as opposed to something that needs installation
The web UI
I only use LLMs for private local translation, so I don't need anything more complex, and llamafile runs faster than llama.cpp
Dark mode for the web UI, support for the Gemma 3 vision model for translating images, support for newer models

1 reply

codesoap Nov 10, 2025

I agree, out-of-the box image-to-text would be really cool! Or if it is already possible, a better explanation how; when trying to input an image, I got an error, that I didn't know how to interpret.

vlasky · 2025-11-21T06:42:03Z

vlasky
Nov 21, 2025

To have a cross-platform executable file that can package the GGUF model, making it self-contained and convenient to distribute
HTTP server functionality
I was able to fork the code and fix the HTTP server bugs
Merging fixes, support for latest models, merging CPU/GPU performance optimisations from other forks, inbuilt RAG/tool support

4 replies

si-open Nov 21, 2025

https://mozilla-ai.github.io/llamafile/creating_llamafiles/
this tutorial not working
zipalign no have 'j' options

aittalam Nov 24, 2025
Maintainer

Hi, can you please provide the output of your zipalign tool? The one that comes with cosmopolitan should have the -j option:

NAME
     zipalign - PKZIP for LLMs

SYNOPSIS
     zipalign [FLAG...] ZIP FILE...

DESCRIPTION
     zipalign adds aligned uncompressed files to a PKZIP archive.

     This tool is designed to concatenate gigabytes of LLM weights to an
     executable. This command goes 10x faster than `zip -j0`. Unlike zip you

[...]

OPTIONS
     The following options are available:

     -h      Show help.

[...]

     -j      Strip directory components. The filename of each input filepath
             will be used as the zip asset name. This is otherwise known as
             the basename. An error will be raised if the same zip asset name
             ends up being specified multiple times.

si-open Nov 27, 2025

Hi, can you please provide the output of your zipalign tool? The one that comes with cosmopolitan should have the -j option:
?
and I don't have the -j option.
What should I do?
ubuntu
uname -a
Linux 6.14.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 02:18:29 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

$apt show zipalign
Package: zipalign
Version: 1:10.0.0+r36-1.1
Priority: extra
Section: universe/devel
Source: android-platform-build
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Android Tools Maintainers <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 64,5 kB
Depends: android-liblog (>= 34.0.5), android-libutils (>= 34.0.5), android-libziparchive (>= 34.0.5), libc6 (>= 2.38), libgcc-s1 (>= 3.0), l>
Homepage: https://android.googlesource.com/platform/build
Download-Size: 20,4 kB
APT-Manual-Installed: yes
APT-Sources: http://pl.archive.ubuntu.com/ubuntu plucky/universe amd64 Packages
Description: Zip archive alignment tool
 zipalign is an archive alignment tool that provides important optimization to
 Android application (.apk) files. The purpose is to ensure that all
 uncompressed data starts with a particular alignment relative to the start of
 the file.

and again

zipalign -h
Zip alignment utility
Copyright (C) 2009 The Android Open Source Project

Usage: zipalign [-f] [-p] [-v] [-z] <align> infile.zip outfile.zip
       zipalign -c [-p] [-v] <align> infile.zip

  <align>: alignment in bytes, e.g. '4' provides 32-bit alignment
  -c: check alignment only (does not modify file)
  -f: overwrite existing outfile.zip
  -p: memory page alignment for stored shared object files
  -v: verbose output
  -z: recompress using Zopfli

aittalam Nov 28, 2025
Maintainer

Cool, thanks for sharing! Now I can confirm this is not llamafile's zipalign, which means we should communicate this better in our docs. I will open an issue for this (many thanks for raising it!), in the meantime you can find the executable file in the o/llamafile directory after you have built the code (you can also build just the zipalign command with make -j o//llamafile/zipalign as described here).

sebington · 2025-12-01T22:08:27Z

sebington
Dec 1, 2025

To me, the Llamafile project has always been hugely interesting and entertaining. There are not many projects that are so original and innovative. Llamafile is one of a kind. Even though I’ve been using it less often lately, I still think it has great potential. I used it to test all sorts of open-source models and configurations. I especially like its ease of use on any platform and the fact that it can be run as a server. It has provided fast local inference for my CPU-only machine. Whisperfiles are a great example of what Llamafile can achieve: even today, year-old Whisperfiles remain far more efficient than newer models in the same category (such as quantized versions of Voxtral, for example). Llamafile also has significant didactic value when you’re learning about AI. It helped me understand how LLMs behave and encouraged me to experiment. With the rapid progress of coding powertools, I’m confident that further improvements and new features could be added to Llamafile. For example, what about giving Llamafile agentic loop abilities, like a kind of 100% local self-contained Claude Code?

0 replies

We want to hear from you! #809

Uh oh!

njbrake Oct 29, 2025 Maintainer

Replies: 10 comments · 6 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aittalam Nov 4, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aittalam Nov 24, 2025 Maintainer

Uh oh!

Uh oh!

aittalam Nov 28, 2025 Maintainer

Uh oh!

njbrake
Oct 29, 2025
Maintainer

Replies: 10 comments 6 replies

aittalam Nov 4, 2025
Maintainer

aittalam Nov 24, 2025
Maintainer

aittalam Nov 28, 2025
Maintainer