Skip to content

rsatrio/llm-chatbot-springboot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMCpp Spring Boot Chat

A robust, CLI-based LLM (Large Language Model) chat application built with Spring Boot 3 and Java 17, utilizing LlamaCpp-Java bindings for high-performance inference.

This project demonstrates how to integrate local LLM inference within a Spring Boot application, supporting GGUF model formats.

Chatbot LLM

Features

  • Interactive CLI Chat: Real-time chat interface via the command line.
  • Local Inference: Runs GGUF models locally (no API keys required).
  • Customizable Prompts: Support for external prompt templates.
  • Configurable Generation: Fine-tune temperature, top-p, context size, and CPU threads.
  • Performance Statistics: Detailed metrics for every response (tokens/sec, time to first token, total tokens).
  • Modular Architecture: Decoupled I/O and business logic for better testability.
  • Comprehensive Tests: Includes unit tests for services and components.
  • Docker Support: Ready-to-use Dockerfile for containerized deployment.

Prerequisites

  • Java: JDK 21 or higher.
  • Maven: 3.8+ (Wrapper included).
  • RAM: Sufficient RAM to load your chosen GGUF model (e.g., ~1GB for TinyLlama 1.1B Q4).

Getting Started

1. Build from Source

Clone the repository and build the application using Maven:

git clone <repository-url>
cd llm-chatbot-springboot
./mvnw clean package

The executable JAR will be located in the target directory (e.g., target/LLMCpp-Chat-SpringBoot.jar).

2. Prepare the Model

Download a GGUF model file (e.g., from Hugging Face).

3. Run the Application

Run the JAR, pointing it to your model file:

java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.model=/path/to/your/model.gguf

Or using the default configuration (looks for tinyllama-1.1b-chat-v1.0.Q6_K.gguf in the working directory):

java -jar target/LLMCpp-Chat-SpringBoot.jar

4. Run Tests

You can run the unit tests using the Maven wrapper:

./mvnw test

Configuration

You can configure the application via application.properties, system properties, or command-line arguments.

Property Description Default Value
llamacpp.model Absolute or relative path to the GGUF model file. tinyllama-1.1b-chat-v1.0.Q6_K.gguf
llamacpp.prompt.path Path to a text file containing the system prompt template. llamacpp_prompt.txt
llamacpp.temperature Controls randomness (0.0 to 1.0). Higher is more creative. 0.2
llamacpp.topp Nucleus sampling probability threshold. 10
llamacpp.thread.cpu Number of CPU threads to use for inference. 1
llamacpp.number.context Context window size (0 uses model default). 0
llamacpp.frequency-penalty Penalty for token repetition. 0.2
llamacpp.miro-stat MiroStat sampling version (V0, V1, V2). V2
llamacpp.stop-strings List of strings that stop generation. `, <

Customizing the Prompt

By default, the application uses a built-in prompt template suitable for chat-tuned models. To customize it, create a file (e.g., my_prompt.txt) and pass it:

java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.prompt.path=my_prompt.txt

Template Variables:

  • {question}: Will be replaced by the user's input.

Example Prompt File:

<|system|>
You are a helpful coding assistant.
<|user|>
{question}
<|assistant|>

Docker Usage

Build the Docker image:

docker build -t chat-cli .

Run the container, mounting the model file:

docker run -it -v /local/path/to/model.gguf:/app/model.gguf chat-cli --llamacpp.model=/app/model.gguf

Architecture

The application follows a clean Spring Boot architecture with decoupled concerns:

  • ChatRunner: Implements CommandLineRunner to start the chat service without blocking the application context initialization.
  • ChatServicesImpl: Manages the high-level chat loop, using an IOService for interaction.
  • ChatbotServicesImpl: Handles the business logic for generating responses using the LLM.
  • IOService / ConsoleIOService: Abstracts I/O operations (CLI), enabling easy unit testing and potential future UI swaps.
  • LlamaCppProperties: Centralized, type-safe configuration bean for all llamacpp.* properties.
  • LlamaModelComponent: Manages the lifecycle of the native LlamaModel instance.
  • PromptComponent: Loads and formats the prompt template.

See docs/ARCHITECTURE.md for more details.

Feedback

Please raise issues in the repository for bugs or feature requests.

About

LLM Chatbot using Spring Boot 3

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors