You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guardrails/introduction.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,11 @@ Guardrailing agents can be a complex undertaking, as it involves understanding t
8
8
9
9
In this chapter, we will cover the fundamentals of guardrailing with Invariant, with a primary focus on how Invariant allows you to write strict and fuzzy rules that precisely constrain your agent's behavior.
10
10
11
+
<divclass="info"/>
12
+
> **Get Started Directly**<br/>
13
+
> Just looking to get started quickly? Take a look at our concise [rule writing reference](./rules.md) to jump right into code. This document serves as a more general introduction to the concepts of how to write rules with Invariant.
14
+
15
+
11
16
## Understanding Your Agent's Capabilities
12
17
13
18
Before securing an agent, it is important to understand its capabilities. This includes understanding the tools and functions that the agent can call, as well as the parameters that can be passed to these functions, e.g. can it access private information, sensitive data, can it send emails, can it take destructive actions like deleting files or making payments, etc.
messages=[{"role": "user", "content": "What do you know about Fight Club?"}],
38
+
model="gpt-4o",
39
+
)
40
+
```
41
+
42
+
Before you run, make sure you export the relevant environment variables including an `INVARIANT_API_KEY`[(get one here)](https://explorer.invariantlabs.ai/settings), which you'll need to access Gateway and our low-latency Guardrailing API.
Copy file name to clipboardExpand all lines: docs/guardrails/tool-calls.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ To prevent tool calling related risks, Invariant offers a wide range of options
33
33
To match a specific tool call in a guardrailing rule, you can use `call is tool:<tool_name>` expressions. This allows you to only match a specific tool call, and apply guardrailing rules to it.
34
34
35
35
**Example**: Matching all `send_email` tool call
36
-
```python
36
+
```guardrail
37
37
raise "Must not send any emails" if:
38
38
(call: ToolCall)
39
39
call is tool:send_email
@@ -46,7 +46,7 @@ This rule will trigger for all tool calls to function `send_email`, disregarding
46
46
Tool calls can also be matched by their parameters. This allows you to match only tool calls with specific parameters, e.g. to block them or to restrict the tool interface exposed to the agent.
47
47
48
48
**Example**: Matching a `send_email` tool call with a specific recipient
49
-
```python
49
+
```guardrail
50
50
raise "Must not send any emails to Alice" if:
51
51
(call: ToolCall)
52
52
call is tool:send_email({
@@ -59,7 +59,7 @@ raise "Must not send any emails to Alice" if:
59
59
Similarly, you can use regex matching to match tool calls with specific parameters. This allows you to match specific tool calls with specific parameters, and apply guardrailing rules to them.
60
60
61
61
**Example**: Matching a `send_email` calls with a specific recipient domain
62
-
```python
62
+
```guardrail
63
63
raise "Must not send any emails to <anyone>@disallowed.com" if:
64
64
(call: ToolCall)
65
65
call is tool:send_email({
@@ -72,7 +72,7 @@ raise "Must not send any emails to <anyone>@disallowed.com" if:
72
72
You can also use content matching to match tool arguments with certain properties, like whether they contain personally identifiable information (PII), or whether they are flagged as toxic or inappropriate. This allows you to match specific tool calls with specific parameters, and apply guardrailing rules to them.
73
73
74
74
**Example**: Prevent `send_email` calls with phone numbers in the message body.
75
-
```python
75
+
```guardrail
76
76
raise "Must not send any emails to <anyone>@disallowed.com" if:
77
77
(call: ToolCall)
78
78
call is tool:send_email({
@@ -86,7 +86,7 @@ This type of content matching also works for other types of content, including `
86
86
87
87
Alternatively, you can also directly use `invariant.detectors.pii` on the tool call arguments like so:
88
88
89
-
```python
89
+
```guardrail
90
90
from invariant.detectors import pii
91
91
92
92
raise "Must not send any emails to <anyone>@disallowed.com" if:
@@ -102,7 +102,7 @@ raise "Must not send any emails to <anyone>@disallowed.com" if:
102
102
Similar to tool calls, you can check and validate tool outputs.
103
103
104
104
**Example**: Raise an error if PII is detected in the tool output
105
-
```python
105
+
```guardrail
106
106
raise "PII in tool output" if:
107
107
(out: ToolOutput)
108
108
len(pii(out.content)) > 0
@@ -113,7 +113,7 @@ raise "PII in tool output" if:
113
113
You can also check only certain tool outputs, e.g. to only check the output of a specific tool call.
114
114
115
115
**Example**: Raise an error if PII is detected in the tool output
116
-
```python
116
+
```guardrail
117
117
from invariant.detectors import moderated
118
118
119
119
raise "Moderated content in tool output" if:
@@ -130,7 +130,7 @@ Here, only if the `read_website` tool call returns moderated content, the rule w
130
130
To limit your guardrailing rule to a list of different tools, you can also access a tool's name directly:
131
131
132
132
**Example**: Raise an error if any of the banned tools is used.
133
-
```python
133
+
```guardrail
134
134
raise "Banned tool used" if:
135
135
(call: ToolCall)
136
136
call.function.name in ["send_email", "delete_file"]
Invariant is a **security layer to protect agentic AI systems**. It helps you prevent prompt injections, data leaks, steer your agent's behavior, and ensure compliance with your organization's policies.
12
12
13
-
Using a **highly-expressive and self-learning guardrailing system**, Invariant offers precise dataflow and steering capabilities, ensuring that your agents are secure and reliable.
14
-
15
-
You can **deploy Invariant within minutes**, using our hosted gateway, to ensure quick response to agent security incidents and to prevent prompt injections and data leaks.
13
+
You can **deploy Invariant within minutes using our hosted gateway**, to ensure quick response to agent security incidents and to get your agent ready for production.
16
14
17
15
### How Invariant Works
18
16
19
-
Invariant acts as a transparent layer between your agent system and the LLM and tool providers. It intercepts all LLM calls and tool actions, and applies guardrailing rules according to a user-specified security policy, i.e. your guardrailing rules.
17
+
Invariant acts as a transparent layer between your agent system and the LLM and tool providers. It intercepts all LLM calls and tool actions, and applies steering rules according to a provided guardrailing policies.
18
+
19
+
Policies are defined in terms of both [deterministic and fuzzy rules](./guardrails/). During operation, your agent is continuously evaluated against them, to restrict its behavior to prevent malfunction and abuse.
20
20
21
-
It does not require any invasive code changes, and can be used with any agent system, framework and LLM.
21
+
Invariant does not require invasive code changes, and can be used with any agent, framework and LLM.
In this setup, a simple Invariant rule for safeguarding against leakage flows in an agent looks like this:
28
+
29
+
```python
30
+
raise"agent leaks internal data"if:
31
+
# check all flows between tool calls
32
+
(output: ToolOutput) -> (call: ToolCall)
33
+
# detects sensitive data in the first output
34
+
is_sensitive(output.content)
35
+
# detects a potentially sensitive action like sending an email
36
+
call is tool:send_email
37
+
```
38
+
39
+
Many security rules like these ship out-of-the-box with Invariant, and you can easily define your own rules to suit your needs and policies.
40
+
27
41
This documentation describes how to set up Invariant and the relevant guardrailing rules for your agent systems such that you can secure your agents and prevent them from engaging in malicious behavior.
0 commit comments