Skip to content

Conversation

@anivar
Copy link

@anivar anivar commented Aug 16, 2025

LLaVA Multi-Modal Support

Makes it easier to use LLaVA models by packaging everything in one file.

Problem

LLaVA needs two files - the language model and vision encoder. This breaks the single-file philosophy.

Solution

Allow multiple GGUF files in one llamafile and auto-detect the vision encoder.

Changes

  • llamafile.c: Allow multiple GGUFs (just removed one error check)
  • llava-cli.cpp: Detect when running as llamafile, auto-find mmproj
  • package_llava.sh: Helper script

Usage

# Package
./package_llava.sh llava-v1.5-7b.gguf mmproj.gguf llava.llamafile

# Run without --mmproj
./llava.llamafile --image cat.jpg

Tested with mock files. Fixes TODO at line 180.

Enable packaging vision encoder (mmproj) and language model in a single
llamafile. The vision encoder is now auto-detected, eliminating the need
for --mmproj flag.

Changes:
- Allow multiple GGUF files in ZIP archives (llamafile.c)
- Add auto-detection for common mmproj filenames (llava-cli.cpp)
- Add llamafile_name() to get filename from handle
- Include package_llava.sh helper script

Example usage:
  ./package_llava.sh model.gguf mmproj.gguf llava.llamafile
  ./llava.llamafile --image photo.jpg -p "What's in this image?"

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Detect when running as ./llava.llamafile executable
- Set model path to executable when embedded
- Allow mmproj auto-detection even when model path is empty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant