Skip to content

Commit 8b49e9c

Browse files
committed
docs: rewrote documentation
1 parent 642c47a commit 8b49e9c

File tree

6 files changed

+396
-349
lines changed

6 files changed

+396
-349
lines changed

README.md

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,33 @@
1919

2020
</div>
2121

22-
Nerve is an ADK ( _Agent Development Kit_ ) designed to be a simple yet powerful platform for creating and executing LLM-based agents.
22+
Nerve is a simple yet powerful Agent Development Kit (ADK) to build, run, evaluate, and orchestrate LLM-based agents using just YAML and a CLI. It’s designed for technical users who want programmable, auditable, and reproducible automation using large language models.
2323

24-
## Main Features
24+
## ✨ Key Features
2525

26-
- Define agents as simple YAML files.
27-
- Simple CLI for creating, installing, and running agents with step-by-step guidance.
28-
- Comes with a library of predefined, built-in tools for common tasks.
29-
- Seamlessly [integrated with MCP](https://github.com/evilsocket/nerve/blob/main/docs/mcp.md).
30-
- Support for [any model provider](https://docs.litellm.ai/docs/providers).
31-
- Benchmark and [evaluate different models](https://github.com/evilsocket/nerve/blob/main/docs/evaluation.md).
26+
**📝 Declarative Agents**
27+
28+
Define agents using a clean YAML format: system prompt, task, tools, and variables — all in one file.
29+
30+
**🔧 Built-in Tools & Extensibility**
31+
32+
Use shell commands, Python functions, or remote tools to power your agents. Tools are fully typed and annotated.
33+
34+
**🌐 Native MCP Support (Client & Server)**
35+
36+
Nerve is the first framework to let you define **MCP servers in YAML** — and act as both **client and server**, enabling agent teams and [deep orchestration](https://github.com/evilsocket/nerve/blob/main/docs/mcp.md).
37+
38+
**📊 Evaluation Mode**
39+
40+
[Benchmark your agents](https://github.com/evilsocket/nerve/blob/main/docs/evaluation.md) with YAML, Parquet, or folder-based test cases. Run reproducible tests, log structured outputs, and track regression or progress.
41+
42+
**🔁 Workflows**
43+
44+
Compose agents into simple, linear pipelines to create multi-step automations with shared context.
45+
46+
**🧪 LLM-Agnostic**
47+
48+
Built on [LiteLLM](https://docs.litellm.ai/), Nerve supports OpenAI, Anthropic, Ollama, [and dozens more](https://docs.litellm.ai/docs/providers) — switch models in one line.
3249

3350
## Quick Start
3451

@@ -46,18 +63,7 @@ nerve create new-agent
4663
nerve run new-agent
4764
```
4865

49-
Agents are simple YAML files that can use a set of built-in tools such as a bash shell, file system primitives [and others](https://github.com/evilsocket/nerve/blob/main/docs/namespaces.md):
50-
51-
```yaml
52-
# who
53-
agent: You are an helpful assistant using pragmatism and shell commands to perform tasks.
54-
# what
55-
task: Find which running process is using more RAM.
56-
# how
57-
using: [shell]
58-
```
59-
60-
Read [this introductory blog post](https://www.evilsocket.net/2025/03/13/How-To-Write-An-Agent/), see the [documentation](https://github.com/evilsocket/nerve/blob/main/docs/index.md) and the [examples](https://github.com/evilsocket/nerve/tree/main/examples) for more.
66+
Read the [documentation](https://github.com/evilsocket/nerve/blob/main/docs/index.md) and the [examples](https://github.com/evilsocket/nerve/tree/main/examples) for more.
6167

6268
## Contributing
6369

docs/concepts.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
## Concepts
2+
3+
### What is Nerve?
4+
Nerve is an **Agent Development Kit (ADK)** designed to let you build intelligent agents using Large Language Models (LLMs) with minimal effort. It provides a declarative YAML-based syntax, powerful CLI tools, and optional integration with the Model Context Protocol (MCP).
5+
6+
Nerve is designed for developers and cybersecurity professionals who:
7+
- Are comfortable with the terminal, Python, and Git.
8+
- Are curious about LLMs but don’t want to build everything from scratch.
9+
- Want to create programmable agents rather than chatbots.
10+
11+
### Agent
12+
An **agent** is a YAML file that defines:
13+
- The agent's "role" (system prompt)
14+
- A task (objective or behavior)
15+
- The tools it can use (e.g., shell commands, HTTP requests, Python functions)
16+
17+
Agents run in a loop, invoking tools and modifying state until they complete or fail their task.
18+
19+
```yaml
20+
agent: You are a cybersecurity assistant.
21+
task: Scan the system for open ports.
22+
using:
23+
- shell
24+
```
25+
26+
### Prompt Interpolation and Variables
27+
Nerve supports [Jinja2](https://jinja.palletsprojects.com/) templating for dynamic prompt construction. You can:
28+
- Inject command line arguments (`{{ target }}`)
29+
- Use tool outputs (`{{ get_logs_tool() }}`)
30+
- Include external files (`{% include 'prompt.md' %}`)
31+
- Reference built-in variables like `{{ CURRENT_DATE }}` or `{{ LOCAL_IP }}`
32+
33+
### Tools
34+
Tools extend the agent’s capabilities. They can be:
35+
- **Shell commands** (interpolated into a shell script)
36+
- **Python functions** (via annotated `tools.py` files)
37+
- **Remote tools via MCP** (from another Nerve instance or a compatible server)
38+
39+
Tools must be documented with a description and arguments:
40+
```yaml
41+
tools:
42+
- name: get_weather
43+
description: Get weather info for a city.
44+
arguments:
45+
- name: city
46+
description: Name of the city.
47+
example: Rome
48+
tool: curl wttr.in/{{ city }}
49+
```
50+
51+
### Workflows
52+
A **workflow** is a YAML file that chains multiple agents sequentially. Each agent in the pipeline can:
53+
- Use a different model
54+
- Receive input from the previous agent
55+
- Define its own tools and prompt
56+
57+
This is useful for simple **linear pipelines** — for example:
58+
```yaml
59+
actors:
60+
step1: { generator: openai://gpt-4o }
61+
step2: { generator: anthropic://claude }
62+
```
63+
Each agent is executed in order, with shared state passed between them. Read more about [workflows in the specific page](workflows.md).
64+
65+
> For more complex orchestrations (e.g., branching logic, sub-agents, delegation), it's better to use **Nerve as an MCP server**. This way, agents can expose themselves as tools to a primary orchestrator agent. Refer to the [MCP documentation](mcp.md).
66+
67+
### Evaluation
68+
Nerve supports **agent evaluation** with test cases to validate correctness, track regressions, or benchmark models.
69+
You define input cases (via YAML, Parquet, or folders), and Nerve runs the agent against them.
70+
71+
```bash
72+
nerve eval path/to/evaluation --output results.json
73+
```
74+
75+
Read more in the [dedicated page](evaluation.md).
76+
77+
### MCP (Model Context Protocol)
78+
Nerve can:
79+
- **Use MCP tools**: connect to external memory, filesystem, or custom tool servers
80+
- **Expose agents as MCP servers**: let other agents call them as tools
81+
82+
```yaml
83+
mcp:
84+
memory:
85+
command: npx
86+
args: ["-y", "@modelcontextprotocol/server-memory"]
87+
```
88+
89+
Read more in the [dedicated page](mcp.md).
90+
91+
### Diagram: Nerve Agent Execution (simplified)
92+
93+
```mermaid
94+
graph TD
95+
A[Start Agent] --> B[Inject Prompt]
96+
B --> C{Tools Required?}
97+
C -- Yes --> D[Call Tool]
98+
D --> E[Update State]
99+
E --> C
100+
C -- No --> F{Task Complete?}
101+
F -- Yes --> G[Exit]
102+
F -- No --> B
103+
```
104+
105+
This loop continues until the task is complete, failed, or times out.
106+

docs/evaluation.md

Lines changed: 56 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,94 @@
11
# Evaluation Mode
22

3-
Nerve provides an evaluation mode that allows you to test your agent's performance against a set of predefined test cases. This is useful for:
3+
Nerve's **evaluation mode** is a strategic feature designed to make benchmarking and validating agents easy, reproducible, and formalized.
44

5-
- Validating agent behavior during development
6-
- Regression testing after making changes
7-
- Benchmarking different models
8-
- Collecting metrics on agent performance
5+
> ⚡ Unlike most tools in the LLM ecosystem, Nerve offers a built-in framework to **test agents against structured cases**, log results, and compare performance across models. It introduces a standard formalism for agent evaluation that does not exist elsewhere.
96
10-
An evaluation consists of an agent and a corresponding set of test cases. These cases can be defined in a `cases.yml` file, stored in a `cases.parquet` file, or organized as individual entries within separate folders.
117

12-
Regardless of how you organize the evaluation cases, the agent will be executed for each one, with a specified number of runs per case. Task completion data and runtime statistics will be collected and saved to an output file.
8+
## 🎯 Why Use It?
9+
Evaluation mode is useful for:
10+
- Verifying agent correctness during development
11+
- Regression testing when updating prompts, tools, or models
12+
- Comparing different model backends
13+
- Collecting structured performance metrics
1314

15+
16+
## 🧪 Running an Evaluation
17+
You run evaluations using:
1418
```bash
1519
nerve eval path/to/evaluation --output results.json
1620
```
21+
Each case is passed to the agent, and results (e.g., completion, duration, output) are saved.
1722

18-
## YAML
1923

20-
You can place a `cases.yml` file in the agent folder with the different test cases. For instance, this is used in the [ab evaluation](https://github.com/evilsocket/eval-ab), where the evaluation cases look like:
24+
## 🗂 Case Formats
25+
Nerve supports three evaluation case formats:
2126

27+
### 1. `cases.yml`
28+
For small test suites. Example:
2229
```yaml
2330
- level1:
2431
program: "A# #A"
2532
- level2:
2633
program: "A# #B B# #A"
27-
# ... and so on
2834
```
29-
30-
These cases are interpolated in the agent prompt:
31-
35+
Used like this in the agent:
3236
```yaml
3337
task: >
34-
## Problem
35-
36-
Now, consider the following program:
38+
Consider this program:
3739
3840
{{ program }}
3941
40-
Fully compute it, step by step and then submit the final result.
42+
Compute it step-by-step and submit the result.
4143
```
4244
43-
## Parquet
44-
45-
For more complex test suite you can use a `cases.parquet` file. An example of this is [this MMLU evaluation](https://github.com/evilsocket/eval-mmlu) that is loading data from the [MMLU (dev) dataset](https://huggingface.co/datasets/cais/mmlu) and using it in the agent prompt:
45+
Used in [eval-ab](https://github.com/evilsocket/eval-ab).
4646
47+
### 2. `cases.parquet`
48+
For large, structured datasets. Example from [eval-mmlu](https://github.com/evilsocket/eval-mmlu):
4749
```yaml
4850
task: >
4951
## Question
5052
5153
{{ question }}
5254
53-
Use the select_choice tool to select the correct answer from this list of possible answers:
54-
55+
Use the `select_choice` tool to pick the right answer:
5556
{% for choice in choices %}
5657
- [{{ loop.index0 }}] {{ choice }}
5758
{% endfor %}
5859
```
5960

60-
## Folders
61+
Can use HuggingFace datasets (e.g., MMLU) directly.
62+
63+
### 3. Folder-Based `cases/`
64+
Organize each case in its own folder:
65+
```
66+
cases/
67+
level0/
68+
input.txt
69+
level1/
70+
input.txt
71+
```
72+
Useful when tools/scripts dynamically load inputs.
73+
See [eval-regex](https://github.com/evilsocket/eval-regex).
74+
75+
76+
## 🧪 Output
77+
Results are written to a `.json` file with details like:
78+
- Case identifier
79+
- Task outcome (success/failure)
80+
- Runtime duration
81+
- Agent/tool outputs
82+
83+
84+
## 📎 Notes
85+
- You can define multiple runs per case for robustness
86+
- Compatible with any agent setup (tools, MCP, workflows, etc.)
87+
- All variables from each case are injected via `{{ ... }}`
88+
89+
90+
## 🧭 Related Docs
91+
- [concepts.md](concepts.md#evaluation)
92+
- [index.md](index.md): CLI usage
93+
- [mcp.md](mcp.md): when using remote agents or tools in evaluation
6194

62-
You can also divide your cases in a `cases` folder in order like in [the regex evaluation](https://github.com/evilsocket/eval-regex) where each input file is organized in `ccases/level0`, `cases/level1`, etc and [read at runtime](https://github.com/evilsocket/eval-regex/blob/main/tools.py#L11) by the tools.

0 commit comments

Comments
 (0)