Skip to content

Commit dc53446

Browse files
more content
1 parent 3bbbe5d commit dc53446

File tree

8 files changed

+372
-40
lines changed

8 files changed

+372
-40
lines changed

.DS_Store

0 Bytes
Binary file not shown.

docs/assets/guardrails/email-agent-example.svg

Lines changed: 142 additions & 0 deletions
Loading
231 KB
Loading

docs/assets/guardrails/space-overlap.svg

Lines changed: 75 additions & 0 deletions
Loading

docs/guardrails/gateway.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Guardrails in Gateway
2+
3+
<div class="subtitle">
4+
Learn how Guardrail rules are enforced and deployed.
5+
</div>

docs/guardrails/introduction.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Securing Agents with Rules
2+
3+
<div class="subtitle">
4+
Learn the fundamentals about guardrailing with Invariant.
5+
</div>
6+
7+
Guardrailing agents can be a complex undertaking, as it involves understanding the entirety of your agent's potential behaviors and misbehaviors.
8+
9+
In this chapter, we will cover the fundamentals of guardrailing with Invariant, with a primary focus on how Invariant allows you to write strict and fuzzy rules that precisely constrain your agent's behavior.
10+
11+
## Understanding Your Agent's Capabilities
12+
13+
Before securing an agent, it is important to understand its capabilities. This includes understanding the tools and functions that the agent can call, as well as the parameters that can be passed to these functions, e.g. can it access private information, sensitive data, can it send emails, can it take destructive actions like deleting files or making payments, etc.
14+
15+
This is important to understand, as it forms the basis for threat modeling and risk assessment. In contrast to traditional software, agentic systems are highly dynamic, meaning tools and APIs can be called in arbitrary ways, and the agent's behavior can change based on the context and the task at hand.
16+
17+
<br/><br/>
18+
<img src="../../assets/guardrails/tool-calls.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 400pt;"/>
19+
<br/><br/>
20+
21+
## Constraining Your Agent's Capability space with Rules
22+
23+
Once you have a good understanding of your agent's capabilities, you can start writing rules to constrain its behavior. By defining guardrails, you limit the agent’s behavior to a safe and intended subset of its full capabilities. These rules can specify allowed tool calls, restrict parameter values, enforce order of operations and prevent destructive looping behaviors.
24+
25+
Invariant’s guardrailing runtime allows you to express these constraints declaratively, ensuring the agent only operates within predefined security boundaries—even in dynamic and open-ended environments. This makes it easier to detect policy violations, reduce risk exposure, and maintain trust in agentic systems.
26+
27+
<br/><br/>
28+
<img src="../../assets/guardrails/space-overlap.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 290pt;"/>
29+
<br/><br/>
30+
31+
## Writing Your First Rule
32+
33+
Let's assume a simple example agent that is capable of managing a user's email inbox. Such an agent may be configured with two tools:
34+
35+
* `get_inbox()` to check a user's inbox and read the emails
36+
* `send_email(to: str, subject: str, body: str)` to send an email to a user.
37+
38+
<br/><br/>
39+
<img src="../../assets/guardrails/email-agent-example.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 290pt;"/>
40+
<br/><br/>
41+
42+
Unconstrained, this agent can easily fail, allowing a bad actor or sheer malfunction to induce failure states such as data leaks, spamming, or even phishing attacks.
43+
44+
To prevent this, we can write a set of simple guardrailing rules, to harden our agent's security posture and limit its capabilities.
45+
46+
### Example 1: Constraining an email agent with guardrails
47+
Let's begin by writing a simple rule that prevents the agent from sending emails to untrusted recipients.
48+
49+
```python
50+
51+
# ensure we know all recipients
52+
raise "Untrusted email recipient" if:
53+
(call: ToolCall)
54+
call is tool:send_email
55+
not match(".*@company.com", call.function.arguments.recipient)
56+
```
57+
58+
This simple rule demonstrates Invariant's guardrailinng rules: To prevent certain agent behavior, we write detection rules that match instances of undesired behavior.
59+
60+
In this case, we want to prevent the agent from sending emails to untrusted recipients. We do so, by describing a tool call that would violate our policy, and then raising an error if such a call is detected.
61+
62+
This rule can now simply be deployed using [Invariant Gateway](../gateway/), such that any agent making use of a `send_email` tool will be prevented from sending emails to untrusted recipients, without changing the agent system's code.
63+
64+
This way of writing guardrails *decouples guardrailing and security rules from core agent logic*. This is a key concept with Invariant and it allows you to write and maintain them independently. It also means, security and agent logic can be maintained by different teams, and that security rules can be deployed and updated independently of the agent system.
65+
66+
67+
### Example 2: Constraining agent flow
68+
69+
Next, let's also consider different workflows our agent may carry out. For example, our agent may first check the user's inbox and then decide to send an email.
70+
71+
This behavior has the risk that the agent may be prompt injected by an untrusted email, leading to malicious behavior. A possible such scenario is illustratd in the figure below, where an attacker sends a malicious email to the agent, which then leaks sensitive information to the attacker.
72+
73+
<figure>
74+
<img src="../../assets/guardrails/email-agent-flow.png" alt="Malicious email agent flow" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 450pt;"/>
75+
<figcaption>An email agent is prompt injected in a malicious email and then leaks information to an attacker.</figcaption>
76+
</figure>
77+
78+
To prevent this, we can write a simple flow rule, that not only checks specific tool calls, but also considers the [data flow of the agent](./dataflow-rules.md), i.e. what the agent has previously done and ingested, before it decided to take action:
79+
80+
```python
81+
from invariant.detectors import prompt_injection, moderated
82+
83+
raise "Must not send email when agent has looked at suspicious email" if:
84+
(inbox: ToolOutput) ~> (call: ToolCall)
85+
inbox is tool:get_inbox
86+
call is tool:send_email
87+
prompt_injection(inbox.content)
88+
```
89+
90+
This rule checks if the agent has looked at a suspicious email, and if so, it raises an error when the agent tries to send an email. It does so by defining a two-part pattern, consisting of a tool call and a tool output.
91+
92+
Our rule triggers when, first, we ingest the output of the `get_inbox` tool, and then we call the `send_email` tool. This is expressed by the `(inbox: ToolOutput) ~> (call: ToolCall)` pattern, which matches the data flow of the agent.
93+
94+
## Deploying and Maintaining Rules
95+
96+
Once you have written a set of guardrailing rules, you can deploy them using Invariant Gateway, to ensure your system is constrained in a secure way.
97+
98+
Deployment can either be done by [sending your guardrailing rules with your LLM requests](./gateway.md), such that Invariant can enforce them on the fly. Alternatively, you can deploy them as a set of rules that are enforced on all agents using a specific Gateway instance. For the latter approach, see also our chapter on [Guardrails in Explorer](./explorer.md), which offers a UI-driven approach to deploying and maintaining guardrails.
99+
100+
### Maintaining Rules
101+
102+
Over time, you will find novel agent behaviors and usage patterns that you may want to guardrail and protect against. Invariant allows you to easily update your rules, and deploy them to your live agents. Since guardrailing and agent code are decoupled, you can easily add new rules and deploy within seconds, without having to change the agent code.
103+
104+
To help with finding and updating rules, Invariant also offers the [Explorer](../explorer/index.md) tool, a trace viewing and debugging application that allows you to visualize the flows of your agents as they are deployed, and to find and inspect novel behaviors. Since manual review of agent behavior is often tedious and error-prone, Explorer also offers access to Invariant's _custom analysis and feedback models_, that can assess your agent's performance and security posture in near real-time, and suggest guardrailing rules to improve it.
105+
106+
Invariant's Analysis models are still in early preview, but if you are interested in working with them, you can [sign up for early access](mailto:[email protected]) by sending us an email.
107+
108+
## Conclusion
109+
110+
This chapter has introduced you to the fundamentals of guardrailing with Invariant. We have covered the basics of writing rules, and how to deploy and maintain them.
111+
112+
To learn more about the different types of rules and how to write them, please refer to the [Rule Language](./rules.md) chapter, which covers the different types of rules you can write with Invariant, and how to use them to secure your agentic systems.

docs/guardrails/rules.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
1-
---
2-
title: Overview
3-
---
4-
5-
# Rule Language
1+
# Reference Document for Rule Writing
62

73
<div class="subtitle">
8-
Learn how to write Guardrail rules.
4+
A concise reference for writing guardrailing rules with Invariant.
95
</div>

mkdocs.yml

Lines changed: 36 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ nav:
7070
- Getting Started: index.md
7171
- Guardrails:
7272
- Overview: guardrails/index.md
73-
- Guardrailing Rules: guardrails/rules.md
73+
- Introduction: guardrails/introduction.md
7474
- Agent Guardrails:
7575
- Tool Calls: guardrails/tool-calls.md
7676
- Loop Detection: guardrails/loops.md
@@ -86,7 +86,9 @@ nav:
8686
- Regex Filters: guardrails/regex-filters.md
8787
- Copyrighted Content: guardrails/copyright.md
8888
- Secret Tokens and Credentials: guardrails/secrets.md
89+
- Guardrails in Gateway: guardrails/gateway.md
8990
- Guardrails in Explorer: guardrails/explorer.md
91+
- Rule Writing Reference: guardrails/rules.md
9092
- Examples:
9193
- RAG: guardrails/examples/rag.md
9294
- Code Execution after URL: guardrails/examples/url.md
@@ -100,39 +102,39 @@ nav:
100102
- Syntax: guardrails/examples/syntax.md
101103
- Copyright: guardrails/examples/copyright.md
102104
- Loop Detection: guardrails/examples/loop.md
103-
- Explorer:
104-
- Overview: explorer/index.md
105-
- Benchmarks: explorer/benchmarks.md
106-
- Search: explorer/search.md
107-
- Self-Host Explorer: explorer/self-hosted.md
108-
- Traces and Datasets: explorer/traces-datasets.md
109-
- Explorer API:
110-
- Client Setup: explorer/api/client-setup.md
111-
- Trace Format: explorer/api/trace-format.md
112-
- Annotations: explorer/api/annotations.md
113-
- SDK Installation: explorer/api/sdk-installation.md
114-
- Dataset Metadata:
115-
- GetDatasetMetadata API : explorer/api/dataset-metadata/get.md
116-
- UpdateDatasetMetadata API : explorer/api/dataset-metadata/update.md
117-
- Uploading Traces:
118-
- File Uploads: explorer/api/uploading-traces/file-uploads.md
119-
- Push API: explorer/api/uploading-traces/push-api.md
120-
- Append Messages API: explorer/api/uploading-traces/append-messages.md
121-
- Gateway:
122-
- Overview: gateway/index.md
123-
- Self-Hosted Gateway: gateway/self-hosted.md
124-
- LLM Integrations:
125-
- OpenAI: gateway/llm-provider-integrations/openai.md
126-
- Anthropic: gateway/llm-provider-integrations/anthropic.md
127-
- Gemini: gateway/llm-provider-integrations/gemini.md
128-
- Agent Integrations:
129-
- Microsoft AutoGen: gateway/agent-integrations/microsoft-autogen.md
130-
- OpenAI Agents SDK: gateway/agent-integrations/openai-agents-sdk.md
131-
- OpenAI Swarm: gateway/agent-integrations/openai-swarm.md
132-
- OpenHands: gateway/agent-integrations/openhands.md
133-
- SWE Agent: gateway/agent-integrations/swe-agent.md
134-
- Browser Use: gateway/agent-integrations/browser-use.md
135-
- Testing:
105+
- Integrations:
106+
- Explorer:
107+
- Overview: explorer/index.md
108+
- Benchmarks: explorer/benchmarks.md
109+
- Search: explorer/search.md
110+
- Self-Host Explorer: explorer/self-hosted.md
111+
- Traces and Datasets: explorer/traces-datasets.md
112+
- Explorer API:
113+
- Client Setup: explorer/api/client-setup.md
114+
- Trace Format: explorer/api/trace-format.md
115+
- Annotations: explorer/api/annotations.md
116+
- SDK Installation: explorer/api/sdk-installation.md
117+
- Dataset Metadata:
118+
- GetDatasetMetadata API : explorer/api/dataset-metadata/get.md
119+
- UpdateDatasetMetadata API : explorer/api/dataset-metadata/update.md
120+
- Uploading Traces:
121+
- File Uploads: explorer/api/uploading-traces/file-uploads.md
122+
- Push API: explorer/api/uploading-traces/push-api.md
123+
- Append Messages API: explorer/api/uploading-traces/append-messages.md
124+
- Gateway:
125+
- Overview: gateway/index.md
126+
- Self-Hosted Gateway: gateway/self-hosted.md
127+
- LLM Integrations:
128+
- OpenAI: gateway/llm-provider-integrations/openai.md
129+
- Anthropic: gateway/llm-provider-integrations/anthropic.md
130+
- Gemini: gateway/llm-provider-integrations/gemini.md
131+
- Agent Integrations:
132+
- Microsoft AutoGen: gateway/agent-integrations/microsoft-autogen.md
133+
- OpenAI Agents SDK: gateway/agent-integrations/openai-agents-sdk.md
134+
- OpenAI Swarm: gateway/agent-integrations/openai-swarm.md
135+
- OpenHands: gateway/agent-integrations/openhands.md
136+
- SWE Agent: gateway/agent-integrations/swe-agent.md
137+
- Browser Use: gateway/agent-integrations/browser-use.md
136138
- Testing:
137139
- Overview: testing/index.md
138140
- Quickstart: testing/quickstart.md

0 commit comments

Comments
 (0)