Skip to content

Conversation

@amakropoulos
Copy link
Collaborator

@amakropoulos amakropoulos commented Aug 13, 2025

Rewrite the LLM backend, LlamaLib, as a standalone C++/C# library and adaptation of LLMUnity.

Features:

  • implement LlamaLib as object-oriented C++/C# library
  • update llama.cpp to b7664
  • fix Vulkan GPU backend
  • Android 16kb support
  • fix iOS - xcode building
  • fix RAG functionality for iOS
  • polish samples
  • optimise streaming functionality and implement callbacks on the C++ end
  • remove chat templates from LLMUnity, use the llama.cpp templating
  • implement property checks
  • common handlng for both json and gbnf grammars
  • simplify integration of tinyBLAS (light GPU backend for Nvidia GPUs)
  • transition of client / server functionality in LlamaLib

Issues:

Copy link

@Draco18s Draco18s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloaded this to see if it solves any of the problems I've been having.

Some issues:

  • Sample chat bot scene has a missing script on the LLMCharacter object and the ChatBot monobehavior has no reference for its LLMAgent, so that will throw an NRE
  • LLMAgent.Warmup calls ChatAsync before the Lama llmAgent.llmAgent field is set (by SetupLLMClient), throwing a NRE, even if not using a Lama based model.
  • Using a coroutine to wait for the field to be non-null before calling Warmup just straight up crashes the Unity editor.
  • Ended up just waiting for the field to be non-null and calling WarmupCallback directly instead.
  • Whether or not the response is streamed has gone missing. Seems to always be in stream mode, chatbot demo doesn't populate the chat bubble, just constantly replaces with the next token.

@amakropoulos amakropoulos merged commit 3d478a6 into main Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants