dreadnode
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.pre-commit-config.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.secrets.baseline‎
Lines changed: 10 additions & 1 deletion b/‎.secrets.baseline‎
Lines changed: 10 additions & 1 deletion
diff --git a/‎docs/docs.json‎
Lines changed: 11 additions & 14 deletions b/‎docs/docs.json‎
Lines changed: 11 additions & 14 deletions
diff --git a/‎docs/examples/dotnet-reversing.mdx‎
Lines changed: 10 additions & 10 deletions b/‎docs/examples/dotnet-reversing.mdx‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎docs/examples/agent-examples.mdx‎ ‎docs/examples/overview.mdx‎docs/examples/agent-examples.mdx renamed to docs/examples/overview.mdx
Lines changed: 8 additions & 8 deletions b/‎docs/examples/agent-examples.mdx‎ ‎docs/examples/overview.mdx‎docs/examples/agent-examples.mdx renamed to docs/examples/overview.mdx
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/examples/saas-scanning.mdx‎ ‎docs/examples/sast-scanning.mdx‎docs/examples/saas-scanning.mdx renamed to docs/examples/sast-scanning.mdx
Lines changed: 8 additions & 8 deletions b/‎docs/examples/saas-scanning.mdx‎ ‎docs/examples/sast-scanning.mdx‎docs/examples/saas-scanning.mdx renamed to docs/examples/sast-scanning.mdx
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/examples/sensitive-data.mdx‎
Lines changed: 8 additions & 8 deletions b/‎docs/examples/sensitive-data.mdx‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/how-to/airtbench-agent.mdx‎
Lines changed: 3 additions & 3 deletions b/‎docs/how-to/airtbench-agent.mdx‎
Lines changed: 3 additions & 3 deletions
@@ -32,7 +32,7 @@ repos:
     rev: v2.4.1
     hooks:
       - id: codespell
-        entry: codespell -q 3 -f --skip=".git,.github,README.md" -L astroid,braket,te,ROUGE
+        entry: codespell -q 3 -f --skip=".git,.github,README.md" -L astroid,braket,te,ROUGE,lief
 
   # Python code security
   - repo: https://github.com/PyCQA/bandit
 
@@ -127,6 +127,15 @@
     }
   ],
   "results": {
+    "docs/examples/overview.mdx": [
+      {
+        "type": "Basic Auth Credentials",
+        "filename": "docs/examples/overview.mdx",
+        "hashed_secret": "32a6fcbaa4543f0718079837a574f5835f3143fe",
+        "is_verified": false,
+        "line_number": 190
+      }
+    ],
     "docs/how-to/write-a-ctf-agent.mdx": [
       {
         "type": "Secret Keyword",
@@ -162,5 +171,5 @@
       }
     ]
   },
-  "generated_at": "2025-07-24T10:02:58Z"
+  "generated_at": "2025-07-24T10:42:54Z"
 }
@@ -17,23 +17,19 @@
     "groups": [
       {
         "group": "Getting Started",
+        "pages": ["intro", "install"]
+      },
+      {
+        "group": "Example Agents",
         "pages": [
-          "intro",
-          "install",
-          {
-            "group": "Examples",
-            "pages": [
-              "examples/agent-examples",
-              "examples/dangerous-capabilities",
-              "examples/dotnet-reversing",
-              "examples/python-agent",
-              "examples/saas-scanning",
-              "examples/sensitive-data"
-            ]
-          }
+          "examples/overview",
+          "examples/dangerous-capabilities",
+          "examples/dotnet-reversing",
+          "examples/python-agent",
+          "examples/sast-scanning",
+          "examples/sensitive-data"
         ]
       },
-
       {
         "group": "Usage",
         "pages": [
@@ -42,6 +38,7 @@
           "usage/runs",
           "usage/tasks",
           "usage/metrics",
+          "usage/scorers",
           "usage/data-tracking",
           "usage/rich-objects",
           "usage/model-training",
 
@@ -6,24 +6,24 @@ public: true
 See the full example in the [GitHub repository](https://github.com/dreadnode/example-agents/tree/main/dotnet_reversing).
 ---
 
-This agent is designed to perform reverse engineering and analysis of .NET binaries. 
-It can decompile .NET assemblies and leverage a large language model (LLM) to analyze the source code based on a user-defined task, such as identifying security vulnerabilities. 
-The agent can process binaries from a local file path or directly fetch them from the [NuGet package repository](https://www.nuget.org/packages). 
+This agent is designed to perform reverse engineering and analysis of .NET binaries.
+It can decompile .NET assemblies and leverage a large language model (LLM) to analyze the source code based on a user-defined task, such as identifying security vulnerabilities.
+The agent can process binaries from a local file path or directly fetch them from the [NuGet package repository](https://www.nuget.org/packages).
 It operates asynchronously and can run multiple analysis instances in parallel.
 
 ## Intended Use
 
-The primary purpose of this agent is to assist security researchers and developers in automating the process of scanning .NET applications for potential security flaws. 
-A user can provide a high-level task, like "Find only critical vulnerabilities," and the agent will use its tools to decompile the code and use an LLM to analyze it, reporting any findings. 
+The primary purpose of this agent is to assist security researchers and developers in automating the process of scanning .NET applications for potential security flaws.
+A user can provide a high-level task, like "Find only critical vulnerabilities," and the agent will use its tools to decompile the code and use an LLM to analyze it, reporting any findings.
 It can also be used as a simple utility to decompile and view the source code of .NET assemblies.
 
 ## Environment
 
-The agent is a command-line application built with Python. 
-It requires a Python environment with the necessary libraries installed, as specified in the script. 
-It interacts with the public [NuGet API](https://learn.microsoft.com/en-us/nuget/api/overview) (api.nuget.org) to fetch packages. 
-For its analysis capabilities, it relies on a configured language model, which can be a remote API (like GPT-4o-mini) or a locally hosted model (e.g., via Ollama). 
-For observability and task tracking, it can be optionally [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config).
+The agent is a command-line application built with Python.
+It requires a Python environment with the necessary libraries installed, as specified in the script.
+It interacts with the public [NuGet API](https://learn.microsoft.com/en-us/nuget/api/overview) (api.nuget.org) to fetch packages.
+For its analysis capabilities, it relies on a configured language model, which can be a remote API (like GPT-4o-mini) or a locally hosted model (e.g., via Ollama).
+For observability and task tracking, it can be optionally [connected to a Dreadnode server](/usage/config).
 
 ## Tools
 
 
@@ -4,8 +4,8 @@ description: 'Explore a collection of specialized AI agents'
 public: true
 ---
 
-We've created a collection of specialized, autonomous AI agents designed for various complex tasks. 
-Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner. 
+We've created a collection of specialized, autonomous AI agents designed for various complex tasks.
+Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner.
 The agents are built using the [Rigging](https://github.com/dreadnode/rigging) and [Dreadnode](https://github.com/dreadnode/dreadnode-python) libraries for robust interaction and observability.
 
 View the [GitHub repository](https://github.com/dreadnode/example-agents) for more details.
@@ -56,15 +56,15 @@ This agent is a specialized framework for evaluating the security analysis capab
 
 An autonomous agent that explores and analyzes file systems to find and report sensitive data like credentials, API keys, and personal information. Leveraging `fsspec`, it can operate on local files, cloud storage (AWS S3, GCS), and remote repositories (GitHub).
 
-&gt; **[More Details](/examples/sensitive-data-extraction)**
+&gt; **[More Details](/examples/sensitive-data)**
 
 ## General Usage
 
 While each agent has its own specific command-line arguments, they share a common setup:
 
 1.  **Installation**: Each agent is a Python application. Dependencies can be installed via `pip`.
 2.  **LLM Configuration**: The agents use `litellm` to connect to various LLMs. You must configure the appropriate environment variables for the model you intend to use (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).
-3.  **Observability**: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a [Dreadnode](https://docs.dreadnode.io/strikes/usage/config) server by providing a server URL and token.
+3.  **Observability**: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a [Dreadnode](/usage/config) server by providing a server URL and token.
 
 ### Setup
 
@@ -77,7 +77,7 @@ uv sync
 ### Passing Models
 
 For all agents, LLMs are usually specified with a `--model` argument, which is passed directly to our [Rigging](https://github.com/dreadnode/rigging) library.
-You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models [in the docs](https://docs.dreadnode.io/open-source/rigging/topics/generators)
+You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models [in the docs](/open-source/rigging/topics/generators)
 
 Usually, the obvious identifier works out of the box:
 
@@ -103,7 +103,7 @@ uv run -m python_agent --help
 
 - Provided a task (`--task`), begin a generation loop with access to the Jupyter kernel
 - The work directory (`--work-dir`) is mounted into the container, along with any other docker-style volumes (`--volumes`)
-- When finished, the agent markes the task as complete with a status and summary
+- When finished, the agent marks the task as complete with a status and summary
 - The work directory is logged as an artifact for the run
 
 ## Dangerous Capabilities
@@ -126,7 +126,7 @@ as needed to ensure they are network-isolated from each other. The process is ge
 5. If the flag is ever observed in the output, exit
 6. Otherwise run until an error, give up, or max-steps is reached
 
-Check out [./dangerous_capabilities/challenges/challenges.json](./dangerous_capabilities/challenges/challenges.json)
+Check out [challenges.json](https://github.com/dreadnode/example-agents/blob/main/dangerous_capabilities/challenges/challenges.json)
 to see all the environments and prompts.
 
 ## Dotnet Reversing
@@ -162,7 +162,7 @@ uv run -m dotnet_reversing --model <model> --path <nuget-package-id> --nuget
 
 ## Sensitive Data Extraction
 
-This agent is provided access to a filsystem tool based on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)
+This agent is provided access to a filesystem tool based on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)
 for use in extracting sensitive data stored in files.
 
 ```bash
 
@@ -1,23 +1,23 @@
 ---
-title: 'SaaS Scanning Agent'
-description: 'An agent for scanning SaaS applications for security vulnerabilities'
+title: 'SAST Scanning Agent'
+description: 'An agent for scanning application source for security vulnerabilities'
 public: true
 ---
 
-This agent is a specialized Static Application Security Testing (SAST) framework designed to evaluate the capabilities of Large Language Models (LLMs) in identifying security vulnerabilities in source code. 
-It operates by presenting the LLM with a "challenge," a codebase containing known, predefined vulnerabilities. 
-The agent then prompts the model to act as a security expert, analyze the files, and report any security issues it discovers. 
+This agent is a specialized Static Application Security Testing (SAST) framework designed to evaluate the capabilities of Large Language Models (LLMs) in identifying security vulnerabilities in source code.
+It operates by presenting the LLM with a "challenge," a codebase containing known, predefined vulnerabilities.
+The agent then prompts the model to act as a security expert, analyze the files, and report any security issues it discovers.
 The agent tracks the findings and scores the model's performance by comparing its results against a manifest of the known vulnerabilities, providing metrics like coverage and accuracy.
 
 ## Intended Use
 
-The primary purpose of this agent is to benchmark and compare the effectiveness of different LLMs for security code review tasks. 
+The primary purpose of this agent is to benchmark and compare the effectiveness of different LLMs for security code review tasks.
 It is intended for researchers and security professionals who want to quantitatively measure a model's ability to detect various types of vulnerabilities (e.g., SQL Injection, XSS, Command Injection) in a controlled and reproducible environment.
 
 ## Environment
 
-The agent is a Python command-line application. 
-The agent operates on a local collection of code "challenges" located in the challenges directory. 
+The agent is a Python command-line application.
+The agent operates on a local collection of code "challenges" located in the challenges directory.
 For its container mode, a running Docker daemon is required on the host machine.
 
 ## Tools
 
@@ -4,8 +4,8 @@ description: 'An agent for identifying sensitive data in filesystems'
 public: true
 ---
 
-This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data. 
-It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data. 
+This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data.
+It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data.
 A key feature of this agent is ability to operate on a wide variety of storage systems, including local directories, cloud storage like AWS S3 and Google Cloud Storage, and even remote sources like GitHub repositories.
 
 ## Intended Use
@@ -14,23 +14,23 @@ The Agent is used to perform a thorough search through fileshares and files, the
 
 ## Environment
 
-The environment is simply a filesystem. 
-The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories). 
-For observability, the agent can be [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config) to log detailed run information, metrics, and findings.
+The environment is simply a filesystem.
+The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories).
+For observability, the agent can be [connected to a Dreadnode server](/usage/config) to log detailed run information, metrics, and findings.
 
 ## Tools
 
-- `fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems. 
+- `fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems.
 This is what enables the agent's versatility in accessing different storage backends like `s3://`, `gs://`, and `github://`.
 
 ## Features
 
 - **Multi-Filesystem Support**: Can analyze files on local disks, AWS S3, Google Cloud Storage, GitHub repositories, and any other backend supported by fsspec.
 - **LLM-Powered Data Identification**: Employs a language model to intelligently parse file contents and identify a broad range of sensitive data types based on context.
 - **Structured Data Reporting**: Uses a dedicated report_sensitive_data tool that forces the LLM to report findings in a structured format, including the file path, location within the file, data type, the sensitive value itself, and a comment.
-- **Location-Aware Reportin**g: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files).
+- **Location-Aware Reporting**: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files).
 - **Autonomous Exploration**: The agent can independently navigate the directory structure of the target path to ensure comprehensive coverage.
-- **Task Contro**l: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process.
+- **Task Control**: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process.
 
 ## References
 
 
@@ -7,7 +7,7 @@ public: true
 <Note>
 This documentation complements the [`dreadnode/AIRTBench-Code`](https://github.com/dreadnode/AIRTBench-Code) AI Red-Teaming Agent. We'll reference specific components throughout this topic, but you can also explore the full implementation to understand how everything fits together.
 
-For this guide, we'll assume you have the `dreadnode` package installed and are familiar with the basics of Strikes. If you haven't already, check out the [installation](../install) and [introduction](../intro) guides. Additionally, as mentioned in the [Agent Implementation](#agent-implementation) section, we will be using a [Rigging](https://github.com/dreadnode/rigging) agent, documented [here](https://docs.dreadnode.io/open-source/rigging/intro).
+For this guide, we'll assume you have the `dreadnode` package installed and are familiar with the basics of Strikes. If you haven't already, check out the [installation](../install) and [introduction](../intro) guides. Additionally, as mentioned in the [Agent Implementation](#agent-implementation) section, we will be using a [Rigging](https://github.com/dreadnode/rigging) agent, documented [here](/open-source/rigging/intro).
 </Note>
 
 <Info>
@@ -16,7 +16,7 @@ This agent also serves as a major functional component to complement our practic
 The paper discusses the design and implementation of the agent, as well as its performance on various challenges. You can find the paper [here](https://arxiv.org/abs/2506.14682) on arXiv, or learn more on our accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
 </Info>
 
-In this guide, we'll cover building an agent capable of solving AI/ML capture-the-flag (CTF) challenges hosted on [Crucible](../../crucible/overview.mdx). While we won't delve deeply into the theory behind large language models (LLMs) or the Crucible CTF format, we'll provide enough context to understand how to design an agent that can effectively tackle these challenges.
+In this guide, we'll cover building an agent capable of solving AI/ML capture-the-flag (CTF) challenges hosted on [Crucible](/crucible/overview). While we won't delve deeply into the theory behind large language models (LLMs) or the Crucible CTF format, we'll provide enough context to understand how to design an agent that can effectively tackle these challenges.
 
 We'll use Strikes to gather insightful data on agent behavior and evaluate performance based on the agent's ability to dynamically capture flags generated by Crucible. To achieve this, we'll equip the agent with interactive environments that closely resemble those used by human operators. These environments will allow for multi-step reasoning, command execution, result inspection, and iterative problem solving.
 
@@ -106,7 +106,7 @@ sequenceDiagram
 
 ## Crucible Challenge Notebooks
 
-The Crucible challenge notebooks are designed to run in a Jupyter environment, providing a standardized interface to interact with challenges through API calls. Each notebook is organized into sections that focus on different aspects of the challenge. You can find a detailed breakdown of the notebook structure [here](../../crucible/how-to/use-challenge-notebooks.mdx).
+The Crucible challenge notebooks are designed to run in a Jupyter environment, providing a standardized interface to interact with challenges through API calls. Each notebook is organized into sections that focus on different aspects of the challenge. You can find a detailed breakdown of the notebook structure [here](/crucible/how-to/use-challenge-notebooks).
 
 The agent harness converts these notebooks into Markdown by loading the notebook file using `Notebook.load()` and transforming its cells into a human-readable format with the `to_markdown()` method.
Original file line number	Diff line number	Diff line change
`@@ -127,6 +127,15 @@`
`127`	`127`	`}`
`128`	`128`	`],`
`129`	`129`	`"results": {`
	`130`	`+ "docs/examples/overview.mdx": [`
	`131`	`+ {`
	`132`	`+ "type": "Basic Auth Credentials",`
	`133`	`+ "filename": "docs/examples/overview.mdx",`
	`134`	`+ "hashed_secret": "32a6fcbaa4543f0718079837a574f5835f3143fe",`
	`135`	`+ "is_verified": false,`
	`136`	`+ "line_number": 190`
	`137`	`+ }`
	`138`	`+ ],`
`130`	`139`	`"docs/how-to/write-a-ctf-agent.mdx": [`
`131`	`140`	`{`
`132`	`141`	`"type": "Secret Keyword",`
`@@ -162,5 +171,5 @@`
`162`	`171`	`}`
`163`	`172`	`]`
`164`	`173`	`},`
`165`		`- "generated_at": "2025-07-24T10:02:58Z"`
	`174`	`+ "generated_at": "2025-07-24T10:42:54Z"`
`166`	`175`	`}`