Skip to content

Commit 111a25f

Browse files
code validation refactoring
1 parent 0c6a795 commit 111a25f

File tree

5 files changed

+143
-111
lines changed

5 files changed

+143
-111
lines changed

docs/assets/invariant.css

Lines changed: 62 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
/* define primary blue */
1414
:root {
1515
--primary-blue: #3d3affac;
16+
--primary-red: #ff6678;
1617
}
1718

1819

@@ -407,6 +408,21 @@ span.detector-badge::before {
407408
border-radius: 4pt;
408409
}
409410

411+
span.high-latency::before {
412+
content: "High-Latency";
413+
color: #eef2ff;
414+
font-size: 10pt;
415+
position: relative;
416+
top: -3pt;
417+
margin-left: 3pt;
418+
background-color: var(--primary-red);
419+
display: inline-block;
420+
height: 18pt;
421+
422+
padding: 2pt 4pt;
423+
border-radius: 4pt;
424+
}
425+
410426
span.parser-badge::before {
411427
content: "Parser";
412428
color: #eef2ff;
@@ -460,7 +476,15 @@ span.parser-badge::before {
460476
}
461477

462478
.detector-badge:hover::after {
463-
content: 'DETECTOR DESCRIPTION';
479+
content: 'Detectors allow you to detect the presence of certain patterns and types of data in an input.';
480+
}
481+
482+
.high-latency {
483+
position: relative;
484+
}
485+
486+
.high-latency:hover::after {
487+
content: 'High-Latency checks may significantly increase the time it takes to process a request. Non-blocking checks are recommended.';
464488
}
465489

466490
.parser-badge {
@@ -479,7 +503,7 @@ span.parser-badge::before {
479503
content: 'BUILTIN DESCRIPTION';
480504
}
481505

482-
.parser-badge:hover::after, .detector-badge:hover::after, .llm-badge:hover::after, .builtin-badge:hover::after {
506+
.parser-badge:hover::after, .detector-badge:hover::after, .llm-badge:hover::after, .builtin-badge:hover::after, .high-latency:hover::after {
483507
position: absolute;
484508
left: 50%;
485509
transform: translateX(-50%);
@@ -798,7 +822,7 @@ ul.md-nav__list {
798822

799823
.risks blockquote {
800824
background-color: rgb(254, 243, 243);
801-
border: 2pt solid #ff6678 !important;
825+
border: 2pt solid var(--primary-red) !important;
802826
}
803827

804828
.risks blockquote>p>strong:first-child {
@@ -812,20 +836,46 @@ ul.md-nav__list {
812836
margin-top: -5pt;
813837
}
814838

815-
.info blockquote {
816-
background-color: rgb(243, 245, 254);
817-
border: 2pt solid #8766ff !important;
839+
.admonition {
840+
background-color: rgb(254, 243, 243) !important;
841+
border: 2pt solid var(--primary-red) !important;
842+
font-size: 12pt;
818843
}
819844

820-
.info blockquote>p>strong:first-child {
821-
margin-bottom: 10pt;
822-
display: inline-block;
823-
padding-left: 25pt;
845+
.admonition p {
846+
font-size: 12pt !important;
847+
}
848+
849+
.admonition .admonition-title {
850+
background-color: transparent !important;
851+
margin: 0pt;
852+
margin-top: 2pt;
853+
padding: 0pt;
854+
padding-top: 10pt;
855+
padding-left: 27.5pt !important;
856+
background: url("../assets/warning.svg") no-repeat 3pt 1pt;
857+
background-position: 4pt 12pt;
858+
background-size: 1.2em;
859+
font-size: 12pt !important;
860+
font-weight: 500 !important;
861+
}
862+
863+
.admonition .admonition-title:before {
864+
mask: none;
865+
-webkit-mask: none;
866+
display: none;
867+
}
824868

869+
.admonition.info {
870+
background-color: rgb(243, 245, 254) !important;
871+
border: 2pt solid #8766ff !important;
872+
}
873+
874+
.admonition.info .admonition-title {
875+
background-color: transparent !important;
825876
background: url("../assets/info.svg") no-repeat 3pt 1pt;
877+
background-position: 4pt 12pt;
826878
background-size: 1.2em;
827-
padding-top: -1pt;
828-
margin-top: -5pt;
829879
}
830880

831881
.md-typeset__table {

docs/explorer/api/uploading-traces/push-api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Additional keyword arguments to pass to the requests method. Default is `None`.
104104

105105
The response object from the Invariant API.
106106

107-
> Client Example
107+
> **Client Example**
108108
```python
109109
from invariant_sdk.client import Client
110110
from invariant_sdk.types.push_traces import PushTracesRequest
@@ -164,7 +164,7 @@ Additional keyword arguments to pass to the requests method. Default is `None`.
164164

165165
The response object from the Invariant API.
166166

167-
> Client Example
167+
> **Client Example**
168168
```python
169169
from invariant_sdk.client import Client
170170

docs/guardrails/code-validation.md

Lines changed: 72 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,14 @@ Secure the code that your agent generates and executes.
55
</div>
66

77
Code validation is a critical component of any code-generating LLM system, as it helps to ensure that the code generated by the LLM is safe and secure. Guardrails provides a simple way to validate the code generated by your LLM, using a set of integration and code parsing capabilities.
8+
9+
!!! danger "Code Validation Risks"
10+
Code validation is a critical component of any code-generating LLM system. An insecure agent could:
811

9-
<div class='risks'/>
10-
> **Code Validation Risks**<br/>
11-
> Code validation is a critical component of any code-generating LLM system. An insecure agent could:
12-
13-
> * Generate code that contains **security vulnerabilities**, such as SQL injection or cross-site scripting
14-
15-
> * Generate code that **contains bugs or errors**, causing the system to crash or behave unexpectedly
16-
17-
> * Produce code that escapes a **sandboxed execution environment**
18-
19-
> * Generate code that is **not well-formed or does not follow best practices**, causing the system to be difficult to maintain or understand
12+
- Generate code that contains **security vulnerabilities**, such as SQL injection or cross-site scripting
13+
- Generate code that **contains bugs or errors**, causing the system to crash or behave unexpectedly
14+
- Produce code that escapes a **sandboxed execution environment**
15+
- Generate code that is **not well-formed or does not follow best practices**, causing the system to be difficult to maintain or understand
2016

2117
To validate code as part of Guardrails, Invariant allows you to invoke external code checking tools as part of the guardrailing process. That means with Invariant you can build code validation right into your LLM layer, without worrying about it on the agent side.
2218

@@ -87,42 +83,87 @@ raise "syntax error" if:
8783
]
8884
```
8985

86+
<!-- template -->
87+
<!-- **Parameters**
88+
89+
| Name | Type | Description |
90+
|-------------|--------|----------------------------------------|
91+
| `data` | `Union[str, List[str]]` | A single message or a list of messages to detect PII in. |
92+
| `entities` | `Optional[List[str]]` | A list of [PII entity types](https://microsoft.github.io/presidio/supported_entities/) to detect. Defaults to detecting all types. |
93+
94+
**Returns**
9095
91-
### `def python_code(data: str | list | dict, ipython_mode=False)`
96+
| Type | Description |
97+
|--------|----------------------------------------|
98+
| `List[str]` | A list of all the detected PII in `data` | -->
99+
100+
## python_code <span class="detector-badge"/>
101+
```python
102+
def python_code(
103+
data: Union[str, List[str]],
104+
ipython_mode: bool = False
105+
) -> List[str]
106+
```
92107

93108
Parses provided Python code and returns a `PythonDetectorResult` object containing the following fields:
109+
## Static Code Analysis
94110

95-
**Parameters:**
111+
Static code analysis allows for powerful pattern-based detection of vulnerabilities and insecure coding practices. Invariant integrates [Semgrep](https://semgrep.dev) directly into your guardrails, enabling deep analysis of assistant-generated code before it's executed.
96112

97-
- `data` (str | list | dict): The Python code to be parsed. This can be a string or list of strings, or a dictionary.
113+
!!! danger "Static Analysis Risks"
114+
Without static analysis, an insecure agent may:
98115

99-
- `ipython_mode` (bool): If set to `True`, the code will be parsed in IPython mode. This is useful for parsing code that uses IPython-specific features or syntax.
116+
* Use **insecure code constructs** like `os.system(input())`
117+
* Execute **command injection attacks** via unsafe shell commands
118+
* Introduce **hardcoded secrets** or credentials
119+
* Violate internal **security or style policies**
100120

121+
You can use `semgrep` within a guardrail to scan code in Python, Bash, and other supported languages.
101122

102-
**Returns:**
123+
## semgrep <span class="detector-badge"></span> <span class="high-latency"></span>
124+
```python
125+
def semgrep(
126+
data: str | list | dict,
127+
lang: str
128+
) -> List[CodeIssue]
129+
```
103130

104-
* `PythonDetectorResult.imports`: This field contains a list of imported modules in the provided code. It is useful for identifying which libraries or modules are being used in the code.
131+
Scans the given code using [Semgrep](http://semgrep.dev) and returns a list of `CodeIssue` objects.
105132

106-
* `PythonDetectorResult.builtins`: A list of built-in functions used in the provided code.
133+
**Parameters**
107134

108-
* `PythonDetectorResult.syntax_error`: A boolean flag indicating whether the provided code has syntax errors.
135+
| Name | Type | Description |
136+
|---------|-----------------------|-------------------------------------------------------|
137+
| `data` | `str | list | dict` | The code to scan. Can be a single string or list. |
138+
| `lang` | `str` | Programming language (`"python"`, `"bash"`, etc). |
109139

110-
* `PythonDetectorResult.syntax_error_exception`: A string containing the exception message if a syntax error occurred while parsing the provided code.
140+
**Returns**
111141

112-
* `PythonDetectorResult.function_calls`: A set of function call identifier names in the provided code.
142+
| Type | Description |
143+
|-----------------|--------------------------------------------------|
144+
| `List[CodeIssue]` | List of issues, each with a description and severity |
113145

114-
### `def ipython_code(data: str | list | dict)`
115146

116-
Same as `python_code`, but for [IPython](https://ipython.org/) code. This function is useful for parsing code that uses IPython-specific features or syntax, i.e. code that runs in Jupyter notebook.
147+
### `CodeIssue` objects
117148

149+
A code issue is represented as a `CodeIssue` object with the following fields:
118150

119-
## Static Code Analysis
151+
```python
152+
class CodeSeverity(str, Enum)
153+
```
154+
155+
| Name | Type | Description |
156+
|-------------|---------------|--------------------------------------------------|
157+
| `.description` | `str` | Description of the issue. |
158+
| `.severity` | `CodeSeverity` | Severity of the issue (e.g., "HIGH", "MEDIUM"). |
120159

121-
Use [`semgrep`](https://semgrep.dev) to perform deep static analysis and identify potential vulnerabilities, bad practices, or policy violations in code. It complements `python_code` by enabling more powerful pattern-based detection.
160+
---
122161

162+
### Example Usage
123163

124-
**Example:** Preventing Dangerous Patterns in Python Code
164+
Use semgrep to perform deep static analysis and identify potential vulnerabilities, bad practices, or policy violations in code. It complements python_code by enabling more powerful pattern-based detection.
125165

166+
**Example:** Detecting Dangerous Patterns in Python Code
126167
```guardrail
127168
from invariant.detectors import semgrep
128169

@@ -151,16 +192,9 @@ raise "Dangerous pattern detected in about-to-be-executed code" if:
151192
]
152193
```
153194

154-
<!-- raise "Vulnerability in bash command [risk=medium]" if:
155-
(call: ToolCall)
156-
call is tool:cmd_run
157-
semgrep_res := semgrep(call.function.arguments.command, lang="bash")
158-
any(semgrep_res) -->
159-
160195
Semgrep also supports other languages than Python, for instance Bash for command line security.
161196

162197
**Example:** Preventing Unsafe Bash Commands
163-
164198
```guardrail
165199
from invariant.detectors import semgrep
166200
@@ -191,63 +225,12 @@ raise "Dangerous pattern detected in about-to-be-executed bash command" if:
191225

192226
---
193227

194-
### `def semgrep(data: str | list | dict, lang: str)`
195-
196-
<!--
197-
198-
#### 🔧 **Parameters**
199-
- `data`: Code to scan. Can be a `str`, `list`, or `dict`.
200-
- `lang`: Programming language (e.g., `'python'`, `'javascript'`).
201-
- `config`: Additional Semgrep config (e.g., rules, rule paths).
202-
203-
#### 🧾 **Returns**
204-
A list of `CodeIssue` objects:
205-
```python
206-
class CodeIssue(BaseModel):
207-
description: str
208-
severity: CodeSeverity # "HIGH", "MEDIUM", or "LOW"
209-
```
210-
211-
Use `.description` and `.severity` in guardrails logic:
212-
```guardrail
213-
raise issue.description if:
214-
(msg: Message)
215-
issues := semgrep(msg.content, lang="python")
216-
issue in issues
217-
issue.severity == "HIGH"
218-
```
228+
### What You Can Detect
219229

220-
#### ⚠️ **What You Can Detect**
221-
- Tainted input flows (e.g. `input()` → `os.system()`)
230+
- Tainted input flows (e.g., `input()``os.system()`)
222231
- Hardcoded secrets
223-
- Insecure patterns (e.g. `subprocess` without `shell=False`)
232+
- Insecure patterns (e.g., unsafe subprocess usage)
224233
- Deprecated APIs
225-
- Style or compliance violations
226-
227-
#### 📦 **Best Use**
228-
Use Semgrep to enforce secure coding practices on any assistant-generated code _before_ execution. -->
234+
- Custom security policies
229235

230-
**Parameters:**
231-
232-
- `data`: Code to scan. Can be a `str`, `list`, or `dict`.
233-
- `lang`: Programming language (e.g., `'python'`, `'javascript'`).
234-
235-
**Returns:**
236-
237-
A list of `CodeIssue` objects:
238-
```python
239-
class CodeIssue(BaseModel):
240-
description: str
241-
severity: CodeSeverity # "HIGH", "MEDIUM", or "LOW"
242-
```
243-
244-
Here, `description` is a string describing the issue, and `severity` is an enum indicating the severity level of the issue (e.g., "HIGH", "MEDIUM", or "LOW"). You can use these fields in your guardrails logic to raise exceptions or take other actions based on the detected issues.
245-
246-
**What You Can Detect**
247-
248-
- Tainted input flows (e.g. `input()``os.system()`)
249-
- Hardcoded secrets
250-
- Insecure patterns (e.g. `subprocess` without `shell=False`)
251-
- Deprecated APIs
252-
- Style or compliance violations
253-
- Other custom patterns defined in Semgrep rules
236+
Semgrep makes it easy to enforce secure coding patterns in your LLM stack without relying on the agent to be secure by default.

docs/guardrails/pii.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def pii(
2121
entities: Optional[List[str]]
2222
) -> List[str]
2323
```
24-
Detector to find personally indentifaible information in text.
24+
Detector to find personally-identifiable information in text.
2525

2626
**Parameters**
2727

docs/guardrails/tool-calls.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,16 @@ Guardrails provide you a powerful way to enforce such security policies, and to
1414
<img src="site:assets/guardrails/tool-calls.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 400pt;"/>
1515
<br/><br/>
1616

17-
<div class='risks'/>
18-
> **Tool Calling Risks**<br/>
19-
> Since tools are an agent's interface to interact with the world, they can also be used to perform actions that are harmful or undesired. For example, an insecure agent could:
17+
!!! danger "Tool Calling Risk"
18+
Since tools are an agent's interface to interact with the world, they can also be used to perform actions that are harmful or undesired. For example, an insecure agent could:
2019

21-
> * Leak sensitive information, e.g. via a `send_email` function
20+
* Leak sensitive information, e.g. via a `send_email` function
2221

23-
> * Delete an important file, via a `delete_file` or a `bash` command
22+
* Delete an important file, via a `delete_file` or a `bash` command
2423

25-
> * Make a payment to an attacker
24+
* Make a payment to an attacker
2625

27-
> * Send a message to a user with sensitive information
26+
* Send a message to a user with sensitive information
2827

2928
To prevent tool calling related risks, Invariant offers a wide range of options to limit, constrain, validate and block tool calls. This chapter describes the different options available to you, and how to use them.
3029

0 commit comments

Comments
 (0)