evilsocket
diff --git a/‎README.md‎
Lines changed: 26 additions & 20 deletions b/‎README.md‎
Lines changed: 26 additions & 20 deletions
diff --git a/‎docs/concepts.md‎
Lines changed: 106 additions & 0 deletions b/‎docs/concepts.md‎
Lines changed: 106 additions & 0 deletions
diff --git a/‎docs/evaluation.md‎
Lines changed: 56 additions & 24 deletions b/‎docs/evaluation.md‎
Lines changed: 56 additions & 24 deletions
@@ -19,16 +19,33 @@
 
 </div>
 
-Nerve is an ADK ( _Agent Development Kit_ ) designed to be a simple yet powerful platform for creating and executing LLM-based agents.
+Nerve is a simple yet powerful Agent Development Kit (ADK) to build, run, evaluate, and orchestrate LLM-based agents using just YAML and a CLI. It’s designed for technical users who want programmable, auditable, and reproducible automation using large language models.
 
-## Main Features
+## ✨ Key Features
 
-- Define agents as simple YAML files.
-- Simple CLI for creating, installing, and running agents with step-by-step guidance.
-- Comes with a library of predefined, built-in tools for common tasks.
-- Seamlessly [integrated with MCP](https://github.com/evilsocket/nerve/blob/main/docs/mcp.md).
-- Support for [any model provider](https://docs.litellm.ai/docs/providers).
-- Benchmark and [evaluate different models](https://github.com/evilsocket/nerve/blob/main/docs/evaluation.md).
+**📝 Declarative Agents**
+
+Define agents using a clean YAML format: system prompt, task, tools, and variables — all in one file.
+
+**🔧 Built-in Tools & Extensibility**
+
+Use shell commands, Python functions, or remote tools to power your agents. Tools are fully typed and annotated.
+
+**🌐 Native MCP Support (Client & Server)**  
+
+Nerve is the first framework to let you define **MCP servers in YAML** — and act as both **client and server**, enabling agent teams and [deep orchestration](https://github.com/evilsocket/nerve/blob/main/docs/mcp.md).
+
+**📊 Evaluation Mode**  
+
+[Benchmark your agents](https://github.com/evilsocket/nerve/blob/main/docs/evaluation.md) with YAML, Parquet, or folder-based test cases. Run reproducible tests, log structured outputs, and track regression or progress. 
+
+**🔁 Workflows**  
+
+Compose agents into simple, linear pipelines to create multi-step automations with shared context.
+
+**🧪 LLM-Agnostic**  
+
+Built on [LiteLLM](https://docs.litellm.ai/), Nerve supports OpenAI, Anthropic, Ollama, [and dozens more](https://docs.litellm.ai/docs/providers) — switch models in one line.
 
 ## Quick Start
 
@@ -46,18 +63,7 @@ nerve create new-agent
 nerve run new-agent
 ```
 
-Agents are simple YAML files that can use a set of built-in tools such as a bash shell, file system primitives [and others](https://github.com/evilsocket/nerve/blob/main/docs/namespaces.md):
-
-```yaml
-# who
-agent: You are an helpful assistant using pragmatism and shell commands to perform tasks.
-# what
-task: Find which running process is using more RAM.
-# how
-using: [shell]
-```
-
-Read [this introductory blog post](https://www.evilsocket.net/2025/03/13/How-To-Write-An-Agent/), see the [documentation](https://github.com/evilsocket/nerve/blob/main/docs/index.md) and the [examples](https://github.com/evilsocket/nerve/tree/main/examples) for more.
+Read the [documentation](https://github.com/evilsocket/nerve/blob/main/docs/index.md) and the [examples](https://github.com/evilsocket/nerve/tree/main/examples) for more.
 
 ## Contributing
 
 
@@ -0,0 +1,106 @@
+## Concepts
+
+### What is Nerve?
+Nerve is an **Agent Development Kit (ADK)** designed to let you build intelligent agents using Large Language Models (LLMs) with minimal effort. It provides a declarative YAML-based syntax, powerful CLI tools, and optional integration with the Model Context Protocol (MCP).
+
+Nerve is designed for developers and cybersecurity professionals who:
+- Are comfortable with the terminal, Python, and Git.
+- Are curious about LLMs but don’t want to build everything from scratch.
+- Want to create programmable agents rather than chatbots.
+
+### Agent
+An **agent** is a YAML file that defines:
+- The agent's "role" (system prompt)
+- A task (objective or behavior)
+- The tools it can use (e.g., shell commands, HTTP requests, Python functions)
+
+Agents run in a loop, invoking tools and modifying state until they complete or fail their task.
+
+```yaml
+agent: You are a cybersecurity assistant.
+task: Scan the system for open ports.
+using:
+  - shell
+```
+
+### Prompt Interpolation and Variables
+Nerve supports [Jinja2](https://jinja.palletsprojects.com/) templating for dynamic prompt construction. You can:
+- Inject command line arguments (`{{ target }}`)
+- Use tool outputs (`{{ get_logs_tool() }}`)
+- Include external files (`{% include 'prompt.md' %}`)
+- Reference built-in variables like `{{ CURRENT_DATE }}` or `{{ LOCAL_IP }}`
+
+### Tools
+Tools extend the agent’s capabilities. They can be:
+- **Shell commands** (interpolated into a shell script)
+- **Python functions** (via annotated `tools.py` files)
+- **Remote tools via MCP** (from another Nerve instance or a compatible server)
+
+Tools must be documented with a description and arguments:
+```yaml
+tools:
+  - name: get_weather
+    description: Get weather info for a city.
+    arguments:
+      - name: city
+        description: Name of the city.
+        example: Rome
+    tool: curl wttr.in/{{ city }}
+```
+
+### Workflows
+A **workflow** is a YAML file that chains multiple agents sequentially. Each agent in the pipeline can:
+- Use a different model
+- Receive input from the previous agent
+- Define its own tools and prompt
+
+This is useful for simple **linear pipelines** — for example:
+```yaml
+actors:
+  step1: { generator: openai://gpt-4o }
+  step2: { generator: anthropic://claude }
+```
+Each agent is executed in order, with shared state passed between them. Read more about [workflows in the specific page](workflows.md).
+
+> For more complex orchestrations (e.g., branching logic, sub-agents, delegation), it's better to use **Nerve as an MCP server**. This way, agents can expose themselves as tools to a primary orchestrator agent. Refer to the [MCP documentation](mcp.md).
+
+### Evaluation
+Nerve supports **agent evaluation** with test cases to validate correctness, track regressions, or benchmark models.
+You define input cases (via YAML, Parquet, or folders), and Nerve runs the agent against them.
+
+```bash
+nerve eval path/to/evaluation --output results.json
+```
+
+Read more in the [dedicated page](evaluation.md).
+
+### MCP (Model Context Protocol)
+Nerve can:
+- **Use MCP tools**: connect to external memory, filesystem, or custom tool servers
+- **Expose agents as MCP servers**: let other agents call them as tools
+
+```yaml
+mcp:
+  memory:
+    command: npx
+    args: ["-y", "@modelcontextprotocol/server-memory"]
+```
+
+Read more in the [dedicated page](mcp.md).
+
+### Diagram: Nerve Agent Execution (simplified)
+
+```mermaid
+graph TD
+    A[Start Agent] --> B[Inject Prompt]
+    B --> C{Tools Required?}
+    C -- Yes --> D[Call Tool]
+    D --> E[Update State]
+    E --> C
+    C -- No --> F{Task Complete?}
+    F -- Yes --> G[Exit]
+    F -- No --> B
+```
+
+This loop continues until the task is complete, failed, or times out.
+
@@ -1,62 +1,94 @@
 # Evaluation Mode
 
-Nerve provides an evaluation mode that allows you to test your agent's performance against a set of predefined test cases. This is useful for:
+Nerve's **evaluation mode** is a strategic feature designed to make benchmarking and validating agents easy, reproducible, and formalized.
 
-- Validating agent behavior during development
-- Regression testing after making changes
-- Benchmarking different models
-- Collecting metrics on agent performance
+> ⚡ Unlike most tools in the LLM ecosystem, Nerve offers a built-in framework to **test agents against structured cases**, log results, and compare performance across models. It introduces a standard formalism for agent evaluation that does not exist elsewhere.
 
-An evaluation consists of an agent and a corresponding set of test cases. These cases can be defined in a `cases.yml` file, stored in a `cases.parquet` file, or organized as individual entries within separate folders.
 
-Regardless of how you organize the evaluation cases, the agent will be executed for each one, with a specified number of runs per case. Task completion data and runtime statistics will be collected and saved to an output file.
+## 🎯 Why Use It?
+Evaluation mode is useful for:
+- Verifying agent correctness during development
+- Regression testing when updating prompts, tools, or models
+- Comparing different model backends
+- Collecting structured performance metrics
 
+
+## 🧪 Running an Evaluation
+You run evaluations using:
 ```bash
 nerve eval path/to/evaluation --output results.json
 ```
+Each case is passed to the agent, and results (e.g., completion, duration, output) are saved.
 
-## YAML
 
-You can place a `cases.yml` file in the agent folder with the different test cases. For instance, this is used in the [ab evaluation](https://github.com/evilsocket/eval-ab), where the evaluation cases look like:
+## 🗂 Case Formats
+Nerve supports three evaluation case formats:
 
+### 1. `cases.yml`
+For small test suites. Example:
 ```yaml
 - level1:
     program: "A# #A"
 - level2:
     program: "A# #B B# #A"
-# ... and so on
 ```
-
-These cases are interpolated in the agent prompt:
-
+Used like this in the agent:
 ```yaml
 task: >
-  ## Problem
-
-  Now, consider the following program:
+  Consider this program:
 
   {{ program }}
 
-  Fully compute it, step by step and then submit the final result.
+  Compute it step-by-step and submit the result.
 ```
 
-## Parquet
-
-For more complex test suite you can use a `cases.parquet` file. An example of this is [this MMLU evaluation](https://github.com/evilsocket/eval-mmlu) that is loading data from the [MMLU (dev) dataset](https://huggingface.co/datasets/cais/mmlu) and using it in the agent prompt:
+Used in [eval-ab](https://github.com/evilsocket/eval-ab).
 
+### 2. `cases.parquet`
+For large, structured datasets. Example from [eval-mmlu](https://github.com/evilsocket/eval-mmlu):
 ```yaml
 task: >
   ## Question
 
   {{ question }}
 
-  Use the select_choice tool to select the correct answer from this list of possible answers:
-
+  Use the `select_choice` tool to pick the right answer:
   {% for choice in choices %}
   - [{{ loop.index0 }}] {{ choice }}
   {% endfor %}
 ```
 
-## Folders
+Can use HuggingFace datasets (e.g., MMLU) directly.
+
+### 3. Folder-Based `cases/`
+Organize each case in its own folder:
+```
+cases/
+  level0/
+    input.txt
+  level1/
+    input.txt
+```
+Useful when tools/scripts dynamically load inputs.
+See [eval-regex](https://github.com/evilsocket/eval-regex).
+
+
+## 🧪 Output
+Results are written to a `.json` file with details like:
+- Case identifier
+- Task outcome (success/failure)
+- Runtime duration
+- Agent/tool outputs
+
+
+## 📎 Notes
+- You can define multiple runs per case for robustness
+- Compatible with any agent setup (tools, MCP, workflows, etc.)
+- All variables from each case are injected via `{{ ... }}`
+
+
+## 🧭 Related Docs
+- [concepts.md](concepts.md#evaluation)
+- [index.md](index.md): CLI usage
+- [mcp.md](mcp.md): when using remote agents or tools in evaluation
 
-You can also divide your cases in a `cases` folder in order like in [the regex evaluation](https://github.com/evilsocket/eval-regex) where each input file is organized in `ccases/level0`, `cases/level1`, etc and [read at runtime](https://github.com/evilsocket/eval-regex/blob/main/tools.py#L11) by the tools.