This document describes the high-level architecture of the LLMCpp Spring Boot Chat application.
The application is designed as a standalone Spring Boot application that provides a Command Line Interface (CLI) for interacting with a Large Language Model (LLM). It leverages the java-llama.cpp library to run GGUF models locally.
The following diagram illustrates the key components and their relationships:
classDiagram
class LlmcppChatDemoApplication {
+main(args)
}
class ChatRunner {
+run(args)
}
class ChatServicesImpl {
+startChatService()
}
class ChatbotServicesImpl {
+generateResponse(question)
}
class ConsoleIOService {
+readInput()
+writeOutput()
}
class LlamaCppProperties {
+String model
+Double temperature
+...
}
class LlamaModelComponent {
-LlamaModel modelLlm
+init()
+getModelLlm()
+generate(InferenceParameters)
}
class PromptComponent {
-String promptContent
+init()
}
LlmcppChatDemoApplication --> ChatRunner : triggers
ChatRunner --> ChatServicesImpl : starts
ChatServicesImpl --> ChatbotServicesImpl : uses
ChatServicesImpl --> ConsoleIOService : uses I/O
ChatbotServicesImpl --> LlamaModelComponent : uses
ChatbotServicesImpl --> PromptComponent : uses
ChatbotServicesImpl --> LlamaCppProperties : config
LlamaModelComponent --> LlamaCppProperties : config
The following sequence diagram shows the flow of control when the application starts and processes a user request:
sequenceDiagram
participant User
participant App as LlmcppChatDemoApplication
participant Runner as ChatRunner
participant Chat as ChatServicesImpl
participant IO as ConsoleIOService
participant Bot as ChatbotServicesImpl
participant Model as LlamaModelComponent
participant Prompt as PromptComponent
App->>Runner: run()
Runner->>Chat: startChatService()
loop Chat Loop
Chat->>IO: writeOutput("user >")
IO-->>User: Display Prompt
User->>IO: Input Question
IO-->>Chat: question
alt Input is "exit"
Chat->>Runner: return
else Valid Input
Chat->>Bot: generateResponse(question)
Bot->>Prompt: getPromptContent()
Bot->>Model: generate(InferenceParameters)
loop Stream Tokens
Model-->>Bot: LlamaOutput token
Bot->>IO: writeOutput(token)
IO-->>User: Print token
end
end
end
LlmcppChatDemoApplication: Bootstraps the Spring application context.ChatRunner: ImplementsCommandLineRunner. This is the preferred way to start CLI applications in Spring Boot as it runs after the context is fully refreshed and doesn't block the main thread during initialization.
- Role: Handles the high-level chat logic.
- Implementation:
- Uses
IOServiceto interact with the user, decoupling the logic fromSystem.in/out. - Implements an infinite loop that breaks on the "exit" command.
- Uses
- Role: Provides a clean interface for reading/writing to the user interface.
- Implementation:
ConsoleIOServiceusesScannerandSystem.out. This abstraction makes the application testable and adaptable to other interfaces (e.g., a web socket or a GUI).
- Role: Orchestrates the response generation.
- Responsibilities:
- Constructs the full prompt by injecting the user's question into the template.
- Configures
InferenceParameters(temperature, top-p, mirostat, etc.). - Calls the
LlamaModelto generate text. - Streams the output directly to the console.
- Role: Wrapper for the native
LlamaModel. - Implementation: Uses
LlamaCppPropertiesfor configuration. - Lifecycle:
- Init: Loads the GGUF model from the configured path.
- Destroy: Ensures the model is closed properly to free native memory.
- Role: Loads and caches the prompt template.
- Role: Type-safe configuration bean mapping all
llamacpp.*properties.