LLMCpp Spring Boot Chat

A robust, CLI-based LLM (Large Language Model) chat application built with Spring Boot 3 and Java 17, utilizing LlamaCpp-Java bindings for high-performance inference.

This project demonstrates how to integrate local LLM inference within a Spring Boot application, supporting GGUF model formats.

Features

Interactive CLI Chat: Real-time chat interface via the command line.
Local Inference: Runs GGUF models locally (no API keys required).
Customizable Prompts: Support for external prompt templates.
Configurable Generation: Fine-tune temperature, top-p, context size, and CPU threads.
Performance Statistics: Detailed metrics for every response (tokens/sec, time to first token, total tokens).
Modular Architecture: Decoupled I/O and business logic for better testability.
Comprehensive Tests: Includes unit tests for services and components.
Docker Support: Ready-to-use Dockerfile for containerized deployment.

Prerequisites

Java: JDK 21 or higher.
Maven: 3.8+ (Wrapper included).
RAM: Sufficient RAM to load your chosen GGUF model (e.g., ~1GB for TinyLlama 1.1B Q4).

Getting Started

1. Build from Source

Clone the repository and build the application using Maven:

git clone <repository-url>
cd llm-chatbot-springboot
./mvnw clean package

The executable JAR will be located in the target directory (e.g., target/LLMCpp-Chat-SpringBoot.jar).

2. Prepare the Model

Download a GGUF model file (e.g., from Hugging Face).

Recommended for testing: TinyLlama-1.1B-Chat-v1.0-GGUF

3. Run the Application

Run the JAR, pointing it to your model file:

java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.model=/path/to/your/model.gguf

Or using the default configuration (looks for tinyllama-1.1b-chat-v1.0.Q6_K.gguf in the working directory):

java -jar target/LLMCpp-Chat-SpringBoot.jar

4. Run Tests

You can run the unit tests using the Maven wrapper:

./mvnw test

Configuration

You can configure the application via application.properties, system properties, or command-line arguments.

Property	Description	Default Value
`llamacpp.model`	Absolute or relative path to the GGUF model file.	`tinyllama-1.1b-chat-v1.0.Q6_K.gguf`
`llamacpp.prompt.path`	Path to a text file containing the system prompt template.	`llamacpp_prompt.txt`
`llamacpp.temperature`	Controls randomness (0.0 to 1.0). Higher is more creative.	`0.2`
`llamacpp.topp`	Nucleus sampling probability threshold.	`10`
`llamacpp.thread.cpu`	Number of CPU threads to use for inference.	`1`
`llamacpp.number.context`	Context window size (0 uses model default).	`0`
`llamacpp.frequency-penalty`	Penalty for token repetition.	`0.2`
`llamacpp.miro-stat`	MiroStat sampling version (`V0`, `V1`, `V2`).	`V2`
`llamacpp.stop-strings`	List of strings that stop generation.	`, <

Customizing the Prompt

By default, the application uses a built-in prompt template suitable for chat-tuned models. To customize it, create a file (e.g., my_prompt.txt) and pass it:

java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.prompt.path=my_prompt.txt

Template Variables:

{question}: Will be replaced by the user's input.

Example Prompt File:

<|system|>
You are a helpful coding assistant.
<|user|>
{question}
<|assistant|>

Docker Usage

Build the Docker image:

docker build -t chat-cli .

Run the container, mounting the model file:

docker run -it -v /local/path/to/model.gguf:/app/model.gguf chat-cli --llamacpp.model=/app/model.gguf

Architecture

The application follows a clean Spring Boot architecture with decoupled concerns:

ChatRunner: Implements CommandLineRunner to start the chat service without blocking the application context initialization.
ChatServicesImpl: Manages the high-level chat loop, using an IOService for interaction.
ChatbotServicesImpl: Handles the business logic for generating responses using the LLM.
IOService / ConsoleIOService: Abstracts I/O operations (CLI), enabling easy unit testing and potential future UI swaps.
LlamaCppProperties: Centralized, type-safe configuration bean for all llamacpp.* properties.
LlamaModelComponent: Manages the lifecycle of the native LlamaModel instance.
PromptComponent: Loads and formats the prompt template.

See docs/ARCHITECTURE.md for more details.

Feedback

Please raise issues in the repository for bugs or feature requests.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.mvn/wrapper		.mvn/wrapper
docs		docs
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
demo-local-chatbot.gif		demo-local-chatbot.gif
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMCpp Spring Boot Chat

Features

Prerequisites

Getting Started

1. Build from Source

2. Prepare the Model

3. Run the Application

4. Run Tests

Configuration

Customizing the Prompt

Docker Usage

Architecture

Feedback

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLMCpp Spring Boot Chat

Features

Prerequisites

Getting Started

1. Build from Source

2. Prepare the Model

3. Run the Application

4. Run Tests

Configuration

Customizing the Prompt

Docker Usage

Architecture

Feedback

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages