You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -5,7 +5,7 @@ Building blocks for **local** agents in C++.
5
5
> [!NOTE]
6
6
> This library is designed for running small language models locally using [llama.cpp](https://github.com/ggml-org/llama.cpp). If you want to call external LLM APIs, this is not the right fit.
7
7
8
-
##Examples
8
+
# Examples
9
9
10
10
-**[Context Engineering](./examples/context-engineering/README.md)** - Use callbacks to manipulate the context between iterations of the agent loop.
std::vector<common_chat_msg> messages = {{"user", "What is 42 * 17?"}};
95
-
std::string response = agent.run_loop(messages);
96
-
std::cout << response << std::endl;
97
-
}
98
-
```
99
-
100
39
## Agent Loop
101
40
102
-
In the current LLM (Large Language Models) world, and `agent` is usually a simple loop that intersperses `Model Calls` and `Tool Executions`, until a stop condition is met:
103
-
104
-
```mermaid
105
-
graph TD
106
-
User([User Input]) --> Model[Model Call]
107
-
Model --> Decision{Stop Condition Met?}
108
-
Decision -->|Yes| End([End])
109
-
Decision -->|No| Tool[Tool Execution]
110
-
Tool --> Model
111
-
```
41
+
In the current LLM (Large Language Models) world, and `agent` is usually a simple loop that intersperses `Model Calls` and `Tool Executions`, until a stop condition is met.
112
42
113
43
> [!IMPORTANT]
114
-
> There are different ways to implement the stop condition.
115
-
> By default we let the agent decide by generating an output *without* tool executions.
44
+
> There are different ways to implement stop conditions.
45
+
> By default, we let the agent decide when to end the loop, by generating an output *without* tool executions.
116
46
> You can implement additional stop conditions via callbacks.
117
47
118
-
###Callbacks
48
+
## Callbacks
119
49
120
50
Callbacks allow you to hook into the agent lifecycle at specific points:
121
51
@@ -125,24 +55,22 @@ Callbacks allow you to hook into the agent lifecycle at specific points:
125
55
126
56
Use callbacks for logging, context manipulation, human-in-the-loop approval, or error recovery.
127
57
128
-
###Instructions
58
+
## Instructions
129
59
130
60
A system prompt that defines the agent's behavior and capabilities. Passed to the `Agent` constructor and automatically prepended to conversations.
131
61
132
-
###Model
62
+
## Model
133
63
134
64
Encapsulates **local** LLM initialization and inference using [llama.cpp](https://github.com/ggml-org/llama.cpp). This is tightly coupled to llama.cpp and requires models in GGUF format.
135
65
136
-
> **Architectural note:** The `Model` class is not backend-agnostic. It is built specifically for local inference with llama.cpp. There is no abstraction layer for swapping in cloud-based providers like OpenAI or Anthropic.
137
-
138
66
Handles:
139
67
140
68
- Loading GGUF model files (quantized models recommended for efficiency)
141
69
- Chat template application and tokenization
142
70
- Text generation with configurable sampling (temperature, top_p, top_k, etc.)
143
71
- KV cache management for efficient prompt caching
144
72
145
-
###Tools
73
+
## Tools
146
74
147
75
Tools extend the agent's capabilities beyond text generation. Each tool defines:
148
76
@@ -152,9 +80,11 @@ Tools extend the agent's capabilities beyond text generation. Each tool defines:
152
80
153
81
When the model decides to use a tool, the agent parses the tool call, executes it, and feeds the result back into the conversation.
154
82
155
-
##Usage
83
+
# Usage
156
84
157
-
### Option 1: FetchContent (Recommended)
85
+
**C++ Standard:** Requires **C++17** or higher.
86
+
87
+
## Option 1: FetchContent (Recommended)
158
88
159
89
The easiest way to integrate agent.cpp into your CMake project:
For a complete list of build options and backend-specific instructions, see the [llama.cpp build documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md).
228
-
229
-
### Technical Details
230
-
231
-
**C++ Standard:** Requires **C++17** or higher.
232
-
233
-
**Thread Safety:** The `Agent` and `Model` classes are **not thread-safe**. A single `Agent` or `Model` instance should not be accessed concurrently from multiple threads.
234
-
235
-
**Resource Sharing:** The library separates model weights from inference context to enable efficient VRAM usage:
236
-
237
-
-**`ModelWeights`**: Holds the immutable model weights (the heavy VRAM consumer). Create once with `ModelWeights::create(path)` and share via `std::shared_ptr` across multiple `Model` instances.
238
-
-**`Model`**: Holds a context (KV cache, sampler state) and a reference to shared weights. Each `Model` has its own conversation state.
239
-
240
-
This architecture enables multiple concurrent agents sharing the same weights without loading them multiple times:
241
-
242
-
```cpp
243
-
// Load weights once (heavy VRAM usage)
244
-
auto weights = agent_cpp::ModelWeights::create("model.gguf");
245
-
246
-
// Create multiple models sharing the same weights (lightweight)
247
-
auto validator_model = agent_cpp::Model::create_with_weights(weights);
248
-
auto generator_model = agent_cpp::Model::create_with_weights(weights);
For simple single-agent use cases, `Model::create(path)` still works and handles everything internally.
256
-
257
-
**Exceptions:** All exceptions derive from `agent_cpp::Error` (which extends `std::runtime_error`), allowing you to catch all library errors with a single `catch (const agent_cpp::Error&)` block.
258
-
259
-
To handle tool errors gracefully without exceptions propagating, check the [`error_recovery_callback](./examples/shared) which converts tool errors into JSON results that the model can see and retry.
0 commit comments