|
| 1 | +# Securing Agents with Rules |
| 2 | + |
| 3 | +<div class="subtitle"> |
| 4 | +Learn the fundamentals about guardrailing with Invariant. |
| 5 | +</div> |
| 6 | + |
| 7 | +Guardrailing agents can be a complex undertaking, as it involves understanding the entirety of your agent's potential behaviors and misbehaviors. |
| 8 | + |
| 9 | +In this chapter, we will cover the fundamentals of guardrailing with Invariant, with a primary focus on how Invariant allows you to write strict and fuzzy rules that precisely constrain your agent's behavior. |
| 10 | + |
| 11 | +## Understanding Your Agent's Capabilities |
| 12 | + |
| 13 | +Before securing an agent, it is important to understand its capabilities. This includes understanding the tools and functions that the agent can call, as well as the parameters that can be passed to these functions, e.g. can it access private information, sensitive data, can it send emails, can it take destructive actions like deleting files or making payments, etc. |
| 14 | + |
| 15 | +This is important to understand, as it forms the basis for threat modeling and risk assessment. In contrast to traditional software, agentic systems are highly dynamic, meaning tools and APIs can be called in arbitrary ways, and the agent's behavior can change based on the context and the task at hand. |
| 16 | + |
| 17 | +<br/><br/> |
| 18 | +<img src="../../assets/guardrails/tool-calls.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 400pt;"/> |
| 19 | +<br/><br/> |
| 20 | + |
| 21 | +## Constraining Your Agent's Capability space with Rules |
| 22 | + |
| 23 | +Once you have a good understanding of your agent's capabilities, you can start writing rules to constrain its behavior. By defining guardrails, you limit the agent’s behavior to a safe and intended subset of its full capabilities. These rules can specify allowed tool calls, restrict parameter values, enforce order of operations and prevent destructive looping behaviors. |
| 24 | + |
| 25 | +Invariant’s guardrailing runtime allows you to express these constraints declaratively, ensuring the agent only operates within predefined security boundaries—even in dynamic and open-ended environments. This makes it easier to detect policy violations, reduce risk exposure, and maintain trust in agentic systems. |
| 26 | + |
| 27 | +<br/><br/> |
| 28 | +<img src="../../assets/guardrails/space-overlap.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 290pt;"/> |
| 29 | +<br/><br/> |
| 30 | + |
| 31 | +## Writing Your First Rule |
| 32 | + |
| 33 | +Let's assume a simple example agent that is capable of managing a user's email inbox. Such an agent may be configured with two tools: |
| 34 | + |
| 35 | +* `get_inbox()` to check a user's inbox and read the emails |
| 36 | +* `send_email(to: str, subject: str, body: str)` to send an email to a user. |
| 37 | + |
| 38 | +<br/><br/> |
| 39 | +<img src="../../assets/guardrails/email-agent-example.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 290pt;"/> |
| 40 | +<br/><br/> |
| 41 | + |
| 42 | +Unconstrained, this agent can easily fail, allowing a bad actor or sheer malfunction to induce failure states such as data leaks, spamming, or even phishing attacks. |
| 43 | + |
| 44 | +To prevent this, we can write a set of simple guardrailing rules, to harden our agent's security posture and limit its capabilities. |
| 45 | + |
| 46 | +### Example 1: Constraining an email agent with guardrails |
| 47 | +Let's begin by writing a simple rule that prevents the agent from sending emails to untrusted recipients. |
| 48 | + |
| 49 | +```python |
| 50 | + |
| 51 | +# ensure we know all recipients |
| 52 | +raise "Untrusted email recipient" if: |
| 53 | + (call: ToolCall) |
| 54 | + call is tool:send_email |
| 55 | + not match(".*@company.com", call.function.arguments.recipient) |
| 56 | +``` |
| 57 | + |
| 58 | +This simple rule demonstrates Invariant's guardrailinng rules: To prevent certain agent behavior, we write detection rules that match instances of undesired behavior. |
| 59 | + |
| 60 | +In this case, we want to prevent the agent from sending emails to untrusted recipients. We do so, by describing a tool call that would violate our policy, and then raising an error if such a call is detected. |
| 61 | + |
| 62 | +This rule can now simply be deployed using [Invariant Gateway](../gateway/), such that any agent making use of a `send_email` tool will be prevented from sending emails to untrusted recipients, without changing the agent system's code. |
| 63 | + |
| 64 | +This way of writing guardrails *decouples guardrailing and security rules from core agent logic*. This is a key concept with Invariant and it allows you to write and maintain them independently. It also means, security and agent logic can be maintained by different teams, and that security rules can be deployed and updated independently of the agent system. |
| 65 | + |
| 66 | + |
| 67 | +### Example 2: Constraining agent flow |
| 68 | + |
| 69 | +Next, let's also consider different workflows our agent may carry out. For example, our agent may first check the user's inbox and then decide to send an email. |
| 70 | + |
| 71 | +This behavior has the risk that the agent may be prompt injected by an untrusted email, leading to malicious behavior. A possible such scenario is illustratd in the figure below, where an attacker sends a malicious email to the agent, which then leaks sensitive information to the attacker. |
| 72 | + |
| 73 | +<figure> |
| 74 | +<img src="../../assets/guardrails/email-agent-flow.png" alt="Malicious email agent flow" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 450pt;"/> |
| 75 | +<figcaption>An email agent is prompt injected in a malicious email and then leaks information to an attacker.</figcaption> |
| 76 | +</figure> |
| 77 | + |
| 78 | +To prevent this, we can write a simple flow rule, that not only checks specific tool calls, but also considers the [data flow of the agent](./dataflow-rules.md), i.e. what the agent has previously done and ingested, before it decided to take action: |
| 79 | + |
| 80 | +```python |
| 81 | +from invariant.detectors import prompt_injection, moderated |
| 82 | + |
| 83 | +raise "Must not send email when agent has looked at suspicious email" if: |
| 84 | + (inbox: ToolOutput) ~> (call: ToolCall) |
| 85 | + inbox is tool:get_inbox |
| 86 | + call is tool:send_email |
| 87 | + prompt_injection(inbox.content) |
| 88 | +``` |
| 89 | + |
| 90 | +This rule checks if the agent has looked at a suspicious email, and if so, it raises an error when the agent tries to send an email. It does so by defining a two-part pattern, consisting of a tool call and a tool output. |
| 91 | + |
| 92 | +Our rule triggers when, first, we ingest the output of the `get_inbox` tool, and then we call the `send_email` tool. This is expressed by the `(inbox: ToolOutput) ~> (call: ToolCall)` pattern, which matches the data flow of the agent. |
| 93 | + |
| 94 | +## Deploying and Maintaining Rules |
| 95 | + |
| 96 | +Once you have written a set of guardrailing rules, you can deploy them using Invariant Gateway, to ensure your system is constrained in a secure way. |
| 97 | + |
| 98 | +Deployment can either be done by [sending your guardrailing rules with your LLM requests](./gateway.md), such that Invariant can enforce them on the fly. Alternatively, you can deploy them as a set of rules that are enforced on all agents using a specific Gateway instance. For the latter approach, see also our chapter on [Guardrails in Explorer](./explorer.md), which offers a UI-driven approach to deploying and maintaining guardrails. |
| 99 | + |
| 100 | +### Maintaining Rules |
| 101 | + |
| 102 | +Over time, you will find novel agent behaviors and usage patterns that you may want to guardrail and protect against. Invariant allows you to easily update your rules, and deploy them to your live agents. Since guardrailing and agent code are decoupled, you can easily add new rules and deploy within seconds, without having to change the agent code. |
| 103 | + |
| 104 | +To help with finding and updating rules, Invariant also offers the [Explorer](../explorer/index.md) tool, a trace viewing and debugging application that allows you to visualize the flows of your agents as they are deployed, and to find and inspect novel behaviors. Since manual review of agent behavior is often tedious and error-prone, Explorer also offers access to Invariant's _custom analysis and feedback models_, that can assess your agent's performance and security posture in near real-time, and suggest guardrailing rules to improve it. |
| 105 | + |
| 106 | +Invariant's Analysis models are still in early preview, but if you are interested in working with them, you can [sign up for early access ](mailto:[email protected]) by sending us an email. |
| 107 | + |
| 108 | +## Conclusion |
| 109 | + |
| 110 | +This chapter has introduced you to the fundamentals of guardrailing with Invariant. We have covered the basics of writing rules, and how to deploy and maintain them. |
| 111 | + |
| 112 | +To learn more about the different types of rules and how to write them, please refer to the [Rule Language](./rules.md) chapter, which covers the different types of rules you can write with Invariant, and how to use them to secure your agentic systems. |
0 commit comments