You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Added more scorers. Cleaned TaskInput and migrated to Lookups. New docs.
* Additional fixes from feedback
* Docs updates
* Fixing type errors
* Fix type errors
Copy file name to clipboardExpand all lines: docs/examples/dotnet-reversing.mdx
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,24 +6,24 @@ public: true
6
6
See the full example in the [GitHub repository](https://github.com/dreadnode/example-agents/tree/main/dotnet_reversing).
7
7
---
8
8
9
-
This agent is designed to perform reverse engineering and analysis of .NET binaries.
10
-
It can decompile .NET assemblies and leverage a large language model (LLM) to analyze the source code based on a user-defined task, such as identifying security vulnerabilities.
11
-
The agent can process binaries from a local file path or directly fetch them from the [NuGet package repository](https://www.nuget.org/packages).
9
+
This agent is designed to perform reverse engineering and analysis of .NET binaries.
10
+
It can decompile .NET assemblies and leverage a large language model (LLM) to analyze the source code based on a user-defined task, such as identifying security vulnerabilities.
11
+
The agent can process binaries from a local file path or directly fetch them from the [NuGet package repository](https://www.nuget.org/packages).
12
12
It operates asynchronously and can run multiple analysis instances in parallel.
13
13
14
14
## Intended Use
15
15
16
-
The primary purpose of this agent is to assist security researchers and developers in automating the process of scanning .NET applications for potential security flaws.
17
-
A user can provide a high-level task, like "Find only critical vulnerabilities," and the agent will use its tools to decompile the code and use an LLM to analyze it, reporting any findings.
16
+
The primary purpose of this agent is to assist security researchers and developers in automating the process of scanning .NET applications for potential security flaws.
17
+
A user can provide a high-level task, like "Find only critical vulnerabilities," and the agent will use its tools to decompile the code and use an LLM to analyze it, reporting any findings.
18
18
It can also be used as a simple utility to decompile and view the source code of .NET assemblies.
19
19
20
20
## Environment
21
21
22
-
The agent is a command-line application built with Python.
23
-
It requires a Python environment with the necessary libraries installed, as specified in the script.
24
-
It interacts with the public [NuGet API](https://learn.microsoft.com/en-us/nuget/api/overview) (api.nuget.org) to fetch packages.
25
-
For its analysis capabilities, it relies on a configured language model, which can be a remote API (like GPT-4o-mini) or a locally hosted model (e.g., via Ollama).
26
-
For observability and task tracking, it can be optionally [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config).
22
+
The agent is a command-line application built with Python.
23
+
It requires a Python environment with the necessary libraries installed, as specified in the script.
24
+
It interacts with the public [NuGet API](https://learn.microsoft.com/en-us/nuget/api/overview) (api.nuget.org) to fetch packages.
25
+
For its analysis capabilities, it relies on a configured language model, which can be a remote API (like GPT-4o-mini) or a locally hosted model (e.g., via Ollama).
26
+
For observability and task tracking, it can be optionally [connected to a Dreadnode server](/usage/config).
Copy file name to clipboardExpand all lines: docs/examples/overview.mdx
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,8 @@ description: 'Explore a collection of specialized AI agents'
4
4
public: true
5
5
---
6
6
7
-
We've created a collection of specialized, autonomous AI agents designed for various complex tasks.
8
-
Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner.
7
+
We've created a collection of specialized, autonomous AI agents designed for various complex tasks.
8
+
Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner.
9
9
The agents are built using the [Rigging](https://github.com/dreadnode/rigging) and [Dreadnode](https://github.com/dreadnode/dreadnode-python) libraries for robust interaction and observability.
10
10
11
11
View the [GitHub repository](https://github.com/dreadnode/example-agents) for more details.
@@ -56,15 +56,15 @@ This agent is a specialized framework for evaluating the security analysis capab
56
56
57
57
An autonomous agent that explores and analyzes file systems to find and report sensitive data like credentials, API keys, and personal information. Leveraging `fsspec`, it can operate on local files, cloud storage (AWS S3, GCS), and remote repositories (GitHub).
While each agent has its own specific command-line arguments, they share a common setup:
64
64
65
65
1.**Installation**: Each agent is a Python application. Dependencies can be installed via `pip`.
66
66
2.**LLM Configuration**: The agents use `litellm` to connect to various LLMs. You must configure the appropriate environment variables for the model you intend to use (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).
67
-
3.**Observability**: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a [Dreadnode](https://docs.dreadnode.io/strikes/usage/config) server by providing a server URL and token.
67
+
3.**Observability**: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a [Dreadnode](/usage/config) server by providing a server URL and token.
68
68
69
69
### Setup
70
70
@@ -77,7 +77,7 @@ uv sync
77
77
### Passing Models
78
78
79
79
For all agents, LLMs are usually specified with a `--model` argument, which is passed directly to our [Rigging](https://github.com/dreadnode/rigging) library.
80
-
You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models [in the docs](https://docs.dreadnode.io/open-source/rigging/topics/generators)
80
+
You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models [in the docs](/open-source/rigging/topics/generators)
81
81
82
82
Usually, the obvious identifier works out of the box:
83
83
@@ -103,7 +103,7 @@ uv run -m python_agent --help
103
103
104
104
- Provided a task (`--task`), begin a generation loop with access to the Jupyter kernel
105
105
- The work directory (`--work-dir`) is mounted into the container, along with any other docker-style volumes (`--volumes`)
106
-
- When finished, the agent markes the task as complete with a status and summary
106
+
- When finished, the agent marks the task as complete with a status and summary
107
107
- The work directory is logged as an artifact for the run
108
108
109
109
## Dangerous Capabilities
@@ -126,7 +126,7 @@ as needed to ensure they are network-isolated from each other. The process is ge
126
126
5. If the flag is ever observed in the output, exit
127
127
6. Otherwise run until an error, give up, or max-steps is reached
128
128
129
-
Check out [./dangerous_capabilities/challenges/challenges.json](./dangerous_capabilities/challenges/challenges.json)
129
+
Check out [challenges.json](https://github.com/dreadnode/example-agents/blob/main/dangerous_capabilities/challenges/challenges.json)
Copy file name to clipboardExpand all lines: docs/examples/sast-scanning.mdx
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,23 @@
1
1
---
2
-
title: 'SaaS Scanning Agent'
3
-
description: 'An agent for scanning SaaS applications for security vulnerabilities'
2
+
title: 'SAST Scanning Agent'
3
+
description: 'An agent for scanning application source for security vulnerabilities'
4
4
public: true
5
5
---
6
6
7
-
This agent is a specialized Static Application Security Testing (SAST) framework designed to evaluate the capabilities of Large Language Models (LLMs) in identifying security vulnerabilities in source code.
8
-
It operates by presenting the LLM with a "challenge," a codebase containing known, predefined vulnerabilities.
9
-
The agent then prompts the model to act as a security expert, analyze the files, and report any security issues it discovers.
7
+
This agent is a specialized Static Application Security Testing (SAST) framework designed to evaluate the capabilities of Large Language Models (LLMs) in identifying security vulnerabilities in source code.
8
+
It operates by presenting the LLM with a "challenge," a codebase containing known, predefined vulnerabilities.
9
+
The agent then prompts the model to act as a security expert, analyze the files, and report any security issues it discovers.
10
10
The agent tracks the findings and scores the model's performance by comparing its results against a manifest of the known vulnerabilities, providing metrics like coverage and accuracy.
11
11
12
12
## Intended Use
13
13
14
-
The primary purpose of this agent is to benchmark and compare the effectiveness of different LLMs for security code review tasks.
14
+
The primary purpose of this agent is to benchmark and compare the effectiveness of different LLMs for security code review tasks.
15
15
It is intended for researchers and security professionals who want to quantitatively measure a model's ability to detect various types of vulnerabilities (e.g., SQL Injection, XSS, Command Injection) in a controlled and reproducible environment.
16
16
17
17
## Environment
18
18
19
-
The agent is a Python command-line application.
20
-
The agent operates on a local collection of code "challenges" located in the challenges directory.
19
+
The agent is a Python command-line application.
20
+
The agent operates on a local collection of code "challenges" located in the challenges directory.
21
21
For its container mode, a running Docker daemon is required on the host machine.
Copy file name to clipboardExpand all lines: docs/examples/sensitive-data.mdx
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,8 @@ description: 'An agent for identifying sensitive data in filesystems'
4
4
public: true
5
5
---
6
6
7
-
This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data.
8
-
It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data.
7
+
This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data.
8
+
It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data.
9
9
A key feature of this agent is ability to operate on a wide variety of storage systems, including local directories, cloud storage like AWS S3 and Google Cloud Storage, and even remote sources like GitHub repositories.
10
10
11
11
## Intended Use
@@ -14,23 +14,23 @@ The Agent is used to perform a thorough search through fileshares and files, the
14
14
15
15
## Environment
16
16
17
-
The environment is simply a filesystem.
18
-
The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories).
19
-
For observability, the agent can be [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config) to log detailed run information, metrics, and findings.
17
+
The environment is simply a filesystem.
18
+
The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories).
19
+
For observability, the agent can be [connected to a Dreadnode server](/usage/config) to log detailed run information, metrics, and findings.
20
20
21
21
## Tools
22
22
23
-
-`fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems.
23
+
-`fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems.
24
24
This is what enables the agent's versatility in accessing different storage backends like `s3://`, `gs://`, and `github://`.
25
25
26
26
## Features
27
27
28
28
-**Multi-Filesystem Support**: Can analyze files on local disks, AWS S3, Google Cloud Storage, GitHub repositories, and any other backend supported by fsspec.
29
29
-**LLM-Powered Data Identification**: Employs a language model to intelligently parse file contents and identify a broad range of sensitive data types based on context.
30
30
-**Structured Data Reporting**: Uses a dedicated report_sensitive_data tool that forces the LLM to report findings in a structured format, including the file path, location within the file, data type, the sensitive value itself, and a comment.
31
-
-**Location-Aware Reportin**g: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files).
31
+
-**Location-Aware Reporting**: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files).
32
32
-**Autonomous Exploration**: The agent can independently navigate the directory structure of the target path to ensure comprehensive coverage.
33
-
-**Task Contro**l: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process.
33
+
-**Task Control**: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process.
Copy file name to clipboardExpand all lines: docs/how-to/airtbench-agent.mdx
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ public: true
7
7
<Note>
8
8
This documentation complements the [`dreadnode/AIRTBench-Code`](https://github.com/dreadnode/AIRTBench-Code) AI Red-Teaming Agent. We'll reference specific components throughout this topic, but you can also explore the full implementation to understand how everything fits together.
9
9
10
-
For this guide, we'll assume you have the `dreadnode` package installed and are familiar with the basics of Strikes. If you haven't already, check out the [installation](../install) and [introduction](../intro) guides. Additionally, as mentioned in the [Agent Implementation](#agent-implementation) section, we will be using a [Rigging](https://github.com/dreadnode/rigging) agent, documented [here](https://docs.dreadnode.io/open-source/rigging/intro).
10
+
For this guide, we'll assume you have the `dreadnode` package installed and are familiar with the basics of Strikes. If you haven't already, check out the [installation](../install) and [introduction](../intro) guides. Additionally, as mentioned in the [Agent Implementation](#agent-implementation) section, we will be using a [Rigging](https://github.com/dreadnode/rigging) agent, documented [here](/open-source/rigging/intro).
11
11
</Note>
12
12
13
13
<Info>
@@ -16,7 +16,7 @@ This agent also serves as a major functional component to complement our practic
16
16
The paper discusses the design and implementation of the agent, as well as its performance on various challenges. You can find the paper [here](https://arxiv.org/abs/2506.14682) on arXiv, or learn more on our accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
17
17
</Info>
18
18
19
-
In this guide, we'll cover building an agent capable of solving AI/ML capture-the-flag (CTF) challenges hosted on [Crucible](../../crucible/overview.mdx). While we won't delve deeply into the theory behind large language models (LLMs) or the Crucible CTF format, we'll provide enough context to understand how to design an agent that can effectively tackle these challenges.
19
+
In this guide, we'll cover building an agent capable of solving AI/ML capture-the-flag (CTF) challenges hosted on [Crucible](/crucible/overview). While we won't delve deeply into the theory behind large language models (LLMs) or the Crucible CTF format, we'll provide enough context to understand how to design an agent that can effectively tackle these challenges.
20
20
21
21
We'll use Strikes to gather insightful data on agent behavior and evaluate performance based on the agent's ability to dynamically capture flags generated by Crucible. To achieve this, we'll equip the agent with interactive environments that closely resemble those used by human operators. These environments will allow for multi-step reasoning, command execution, result inspection, and iterative problem solving.
22
22
@@ -106,7 +106,7 @@ sequenceDiagram
106
106
107
107
## Crucible Challenge Notebooks
108
108
109
-
The Crucible challenge notebooks are designed to run in a Jupyter environment, providing a standardized interface to interact with challenges through API calls. Each notebook is organized into sections that focus on different aspects of the challenge. You can find a detailed breakdown of the notebook structure [here](../../crucible/how-to/use-challenge-notebooks.mdx).
109
+
The Crucible challenge notebooks are designed to run in a Jupyter environment, providing a standardized interface to interact with challenges through API calls. Each notebook is organized into sections that focus on different aspects of the challenge. You can find a detailed breakdown of the notebook structure [here](/crucible/how-to/use-challenge-notebooks).
110
110
111
111
The agent harness converts these notebooks into Markdown by loading the notebook file using `Notebook.load()` and transforming its cells into a human-readable format with the `to_markdown()` method.
0 commit comments