Skip to content

Conversation

@loganpowell
Copy link

Logan Powell added 2 commits August 8, 2025 10:23
Implements full LoRA (Low-Rank Adaptation) adapter support compatible with
llama.cpp, enabling fine-tuning capabilities in llamafile server mode.

Features:
- Multiple LoRA adapter support with individual scaling factors
- New command-line flags: --lora, --lora-scaled, --lora-base
- Automatic memory mapping disabling for LoRA compatibility
- Per-slot adapter application during initialization
- Clean resource management and cleanup on shutdown

Changes:
- flags.cpp: Add LoRA flag parsing and global adapter management
- prog.cpp: Implement adapter loading, validation, and cleanup
- slot.cpp/slot.h: Add slot-level adapter application logic
- llamafile.h: Define LoRA adapter data structures and constants
- README.md: Add comprehensive LoRA usage documentation
- RELEASE.md: Document new LoRA features for release notes

The implementation follows llama.cpp patterns for maximum compatibility
and provides a solid foundation for advanced fine-tuning workflows.

Tested with Llama 3 8B + LoRA adapters, supporting both single and
multiple adapter configurations with custom scaling factors.

Resolves mozilla-ai#697
Logan Powell and others added 7 commits August 8, 2025 19:29
- Removes redundant code by deferring to llama.cpp for lora structures
- Add Slot::mark_for_refresh() to flag slots for context refresh after LoRA changes
- Integrate needs_refresh_ flag and logic into Slot class and prefill() method
- Update LoRA adapter API handlers to call mark_for_refresh() after applying or updating adapters
- Ensure system prompts and context are preserved using slot’s intelligent prefill mechanism
- Remove naive KV cache clearing logic in favor of slot-managed refresh
- Improves runtime LoRA scale update reliability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: error: unknown argument: --lora

1 participant