code validation refactoring

lbeurerkellner · lbeurerkellner · commit 111a25f5c5bd · 2025-04-14T09:52:40.000+02:00
diff --git a/docs/assets/invariant.css b/docs/assets/invariant.css
@@ -13,6 +13,7 @@
 /* define primary blue */
 :root {
     --primary-blue: #3d3affac;
+    --primary-red: #ff6678;
 }
 
 
@@ -407,6 +408,21 @@ span.detector-badge::before {
     border-radius: 4pt;
 }
 
+span.high-latency::before {
+    content: "High-Latency";
+    color: #eef2ff;
+    font-size: 10pt;
+    position: relative;
+    top: -3pt;
+    margin-left: 3pt;
+    background-color: var(--primary-red);
+    display: inline-block;
+    height: 18pt;
+    
+    padding: 2pt 4pt;
+    border-radius: 4pt;
+}
+
 span.parser-badge::before {
     content: "Parser";
     color: #eef2ff;
@@ -460,7 +476,15 @@ span.parser-badge::before {
  }
 
 .detector-badge:hover::after {
-    content: 'DETECTOR DESCRIPTION';
+    content: 'Detectors allow you to detect the presence of certain patterns and types of data in an input.';
+}
+
+.high-latency {
+    position: relative;
+}
+
+.high-latency:hover::after {
+    content: 'High-Latency checks may significantly increase the time it takes to process a request. Non-blocking checks are recommended.';
 }
 
 .parser-badge {
@@ -479,7 +503,7 @@ span.parser-badge::before {
     content: 'BUILTIN DESCRIPTION';
 }
 
-.parser-badge:hover::after, .detector-badge:hover::after, .llm-badge:hover::after, .builtin-badge:hover::after {
+.parser-badge:hover::after, .detector-badge:hover::after, .llm-badge:hover::after, .builtin-badge:hover::after, .high-latency:hover::after {
     position: absolute;
     left: 50%;
     transform: translateX(-50%);
@@ -798,7 +822,7 @@ ul.md-nav__list {
 
 .risks blockquote {
     background-color: rgb(254, 243, 243);
-    border: 2pt solid #ff6678 !important;
+    border: 2pt solid var(--primary-red) !important;
 }
 
 .risks blockquote>p>strong:first-child {
@@ -812,20 +836,46 @@ ul.md-nav__list {
     margin-top: -5pt;
 }
 
-.info blockquote {
-    background-color: rgb(243, 245, 254);
-    border: 2pt solid #8766ff !important;
+.admonition {
+    background-color: rgb(254, 243, 243) !important;
+    border: 2pt solid var(--primary-red) !important;
+    font-size: 12pt;
 }
 
-.info blockquote>p>strong:first-child {
-    margin-bottom: 10pt;
-    display: inline-block;
-    padding-left: 25pt;
+.admonition p {
+    font-size: 12pt !important;
+}
+
+.admonition .admonition-title {
+    background-color: transparent !important;
+    margin: 0pt;
+    margin-top: 2pt;
+    padding: 0pt;
+    padding-top: 10pt;
+    padding-left: 27.5pt !important;
+    background: url("../assets/warning.svg") no-repeat 3pt 1pt;
+    background-position: 4pt 12pt;
+    background-size: 1.2em;
+    font-size: 12pt !important;
+    font-weight: 500 !important;
+}
+
+.admonition .admonition-title:before {
+    mask: none;
+    -webkit-mask: none;
+    display: none;
+}
 
+.admonition.info {
+    background-color: rgb(243, 245, 254) !important;
+    border: 2pt solid #8766ff !important;
+}
+
+.admonition.info .admonition-title {
+    background-color: transparent !important;
     background: url("../assets/info.svg") no-repeat 3pt 1pt;
+    background-position: 4pt 12pt;
     background-size: 1.2em;
-    padding-top: -1pt;
-    margin-top: -5pt;
 }
 
 .md-typeset__table {
diff --git a/docs/explorer/api/uploading-traces/push-api.md b/docs/explorer/api/uploading-traces/push-api.md
@@ -104,7 +104,7 @@ Additional keyword arguments to pass to the requests method. Default is `None`.
 
 The response object from the Invariant API.
 
-> Client Example
+> **Client Example**
     ```python
     from invariant_sdk.client import Client
     from invariant_sdk.types.push_traces import PushTracesRequest
@@ -164,7 +164,7 @@ Additional keyword arguments to pass to the requests method. Default is `None`.
 
 The response object from the Invariant API.
 
-> Client Example
+> **Client Example**
     ```python
     from invariant_sdk.client import Client
 
diff --git a/docs/guardrails/code-validation.md b/docs/guardrails/code-validation.md
@@ -5,18 +5,14 @@ Secure the code that your agent generates and executes.
 </div>
 
 Code validation is a critical component of any code-generating LLM system, as it helps to ensure that the code generated by the LLM is safe and secure. Guardrails provides a simple way to validate the code generated by your LLM, using a set of integration and code parsing capabilities.
+ 
+!!! danger "Code Validation Risks"
+    Code validation is a critical component of any code-generating LLM system. An insecure agent could:
 
-<div class='risks'/>
-> **Code Validation Risks**<br/>
-> Code validation is a critical component of any code-generating LLM system. An insecure agent could:
-
-> * Generate code that contains **security vulnerabilities**, such as SQL injection or cross-site scripting
-
-> * Generate code that **contains bugs or errors**, causing the system to crash or behave unexpectedly
-
-> * Produce code that escapes a **sandboxed execution environment**
-
-> * Generate code that is **not well-formed or does not follow best practices**, causing the system to be difficult to maintain or understand
+    - Generate code that contains **security vulnerabilities**, such as SQL injection or cross-site scripting  
+    - Generate code that **contains bugs or errors**, causing the system to crash or behave unexpectedly  
+    - Produce code that escapes a **sandboxed execution environment**  
+    - Generate code that is **not well-formed or does not follow best practices**, causing the system to be difficult to maintain or understand
 
 To validate code as part of Guardrails, Invariant allows you to invoke external code checking tools as part of the guardrailing process. That means with Invariant you can build code validation right into your LLM layer, without worrying about it on the agent side.
 
@@ -87,42 +83,87 @@ raise "syntax error" if:
 ]
 ```
 
+<!-- template  -->
+<!-- **Parameters**
+
+| Name        | Type   | Description                            |
+|-------------|--------|----------------------------------------|
+| `data`      | `Union[str, List[str]]` | A single message or a list of messages to detect PII in. |
+| `entities`  | `Optional[List[str]]`   | A list of [PII entity types](https://microsoft.github.io/presidio/supported_entities/) to detect. Defaults to detecting all types. |
+
+**Returns**
 
-### `def python_code(data: str | list | dict,  ipython_mode=False)`
+| Type   | Description                            |
+|--------|----------------------------------------|
+| `List[str]` | A list of all the detected PII in `data` | -->
+
+## python_code <span class="detector-badge"/>
+```python
+def python_code(
+    data: Union[str, List[str]],
+    ipython_mode: bool = False
+) -> List[str]
+```
 
 Parses provided Python code and returns a `PythonDetectorResult` object containing the following fields:
+## Static Code Analysis
 
-**Parameters:**
+Static code analysis allows for powerful pattern-based detection of vulnerabilities and insecure coding practices. Invariant integrates [Semgrep](https://semgrep.dev) directly into your guardrails, enabling deep analysis of assistant-generated code before it's executed.
 
-- `data` (str | list | dict): The Python code to be parsed. This can be a string or list of strings, or a dictionary.
+!!! danger "Static Analysis Risks"
+    Without static analysis, an insecure agent may:
 
-- `ipython_mode` (bool): If set to `True`, the code will be parsed in IPython mode. This is useful for parsing code that uses IPython-specific features or syntax.
+    * Use **insecure code constructs** like `os.system(input())`
+    * Execute **command injection attacks** via unsafe shell commands
+    * Introduce **hardcoded secrets** or credentials
+    * Violate internal **security or style policies**
 
+You can use `semgrep` within a guardrail to scan code in Python, Bash, and other supported languages.
 
-**Returns:**
+## semgrep <span class="detector-badge"></span> <span class="high-latency"></span>
+```python
+def semgrep(
+  data: str | list | dict, 
+  lang: str
+) -> List[CodeIssue]
+```
 
-* `PythonDetectorResult.imports`: This field contains a list of imported modules in the provided code. It is useful for identifying which libraries or modules are being used in the code.
+Scans the given code using [Semgrep](http://semgrep.dev) and returns a list of `CodeIssue` objects.
 
-* `PythonDetectorResult.builtins`: A list of built-in functions used in the provided code.
+**Parameters**
 
-* `PythonDetectorResult.syntax_error`: A boolean flag indicating whether the provided code has syntax errors.
+| Name    | Type                  | Description                                           |
+|---------|-----------------------|-------------------------------------------------------|
+| `data`  | `str | list | dict` | The code to scan. Can be a single string or list.     |
+| `lang`  | `str`                 | Programming language (`"python"`, `"bash"`, etc).     |
 
-* `PythonDetectorResult.syntax_error_exception`: A string containing the exception message if a syntax error occurred while parsing the provided code.
+**Returns**
 
-* `PythonDetectorResult.function_calls`: A set of function call identifier names in the provided code.
+| Type            | Description                                      |
+|-----------------|--------------------------------------------------|
+| `List[CodeIssue]` | List of issues, each with a description and severity |
 
-### `def ipython_code(data: str | list | dict)`
 
-Same as `python_code`, but for [IPython](https://ipython.org/) code. This function is useful for parsing code that uses IPython-specific features or syntax, i.e. code that runs in Jupyter notebook.
+### `CodeIssue` objects
 
+A code issue is represented as a `CodeIssue` object with the following fields:
 
-## Static Code Analysis
+```python
+class CodeSeverity(str, Enum)
+```
+
+| Name        | Type          | Description                                      |
+|-------------|---------------|--------------------------------------------------|
+| `.description` | `str`         | Description of the issue.                        |
+| `.severity`    | `CodeSeverity` | Severity of the issue (e.g., "HIGH", "MEDIUM").  |
 
-Use [`semgrep`](https://semgrep.dev) to perform deep static analysis and identify potential vulnerabilities, bad practices, or policy violations in code. It complements `python_code` by enabling more powerful pattern-based detection.
+---
 
+### Example Usage
 
-**Example:** Preventing Dangerous Patterns in Python Code
+Use semgrep to perform deep static analysis and identify potential vulnerabilities, bad practices, or policy violations in code. It complements python_code by enabling more powerful pattern-based detection.
 
+**Example:** Detecting Dangerous Patterns in Python Code
 ```guardrail
 from invariant.detectors import semgrep
 
@@ -151,16 +192,9 @@ raise "Dangerous pattern detected in about-to-be-executed code" if:
 ]
 ```
 
-<!-- raise "Vulnerability in bash command [risk=medium]" if:
-    (call: ToolCall)
-    call is tool:cmd_run
-    semgrep_res := semgrep(call.function.arguments.command, lang="bash")
-    any(semgrep_res) -->
-
 Semgrep also supports other languages than Python, for instance Bash for command line security.
 
 **Example:** Preventing Unsafe Bash Commands
-
 ```guardrail
 from invariant.detectors import semgrep
 
@@ -191,63 +225,12 @@ raise "Dangerous pattern detected in about-to-be-executed bash command" if:
 
 ---
 
-### `def semgrep(data: str | list | dict, lang: str)`
-
-<!-- 
-
-#### 🔧 **Parameters**
-- `data`: Code to scan. Can be a `str`, `list`, or `dict`.
-- `lang`: Programming language (e.g., `'python'`, `'javascript'`).
-- `config`: Additional Semgrep config (e.g., rules, rule paths).
-
-#### 🧾 **Returns**
-A list of `CodeIssue` objects:
-```python
-class CodeIssue(BaseModel):
-    description: str
-    severity: CodeSeverity  # "HIGH", "MEDIUM", or "LOW"
-```
-
-Use `.description` and `.severity` in guardrails logic:
-```guardrail
-raise issue.description if:
-  (msg: Message)
-  issues := semgrep(msg.content, lang="python")
-  issue in issues
-  issue.severity == "HIGH"
-```
+### What You Can Detect
 
-#### ⚠️ **What You Can Detect**
-- Tainted input flows (e.g. `input()` → `os.system()`)
+- Tainted input flows (e.g., `input()` → `os.system()`)
 - Hardcoded secrets
-- Insecure patterns (e.g. `subprocess` without `shell=False`)
+- Insecure patterns (e.g., unsafe subprocess usage)
 - Deprecated APIs
-- Style or compliance violations
-
-#### 📦 **Best Use**
-Use Semgrep to enforce secure coding practices on any assistant-generated code _before_ execution. -->
+- Custom security policies
 
-**Parameters:**
-
-- `data`: Code to scan. Can be a `str`, `list`, or `dict`.
-- `lang`: Programming language (e.g., `'python'`, `'javascript'`).
-
-**Returns:**
-
-A list of `CodeIssue` objects:
-```python
-class CodeIssue(BaseModel):
-    description: str
-    severity: CodeSeverity  # "HIGH", "MEDIUM", or "LOW"
-```
-
-Here, `description` is a string describing the issue, and `severity` is an enum indicating the severity level of the issue (e.g., "HIGH", "MEDIUM", or "LOW"). You can use these fields in your guardrails logic to raise exceptions or take other actions based on the detected issues.
-
-**What You Can Detect**
-
-- Tainted input flows (e.g. `input()` → `os.system()`)
-- Hardcoded secrets
-- Insecure patterns (e.g. `subprocess` without `shell=False`)
-- Deprecated APIs
-- Style or compliance violations
-- Other custom patterns defined in Semgrep rules
+Semgrep makes it easy to enforce secure coding patterns in your LLM stack without relying on the agent to be secure by default.
diff --git a/docs/guardrails/pii.md b/docs/guardrails/pii.md
@@ -21,7 +21,7 @@ def pii(
     entities: Optional[List[str]]
 ) -> List[str]
 ```
-Detector to find personally indentifaible information in text.
+Detector to find personally-identifiable information in text.
 
 **Parameters**
 
diff --git a/docs/guardrails/tool-calls.md b/docs/guardrails/tool-calls.md
@@ -14,17 +14,16 @@ Guardrails provide you a powerful way to enforce such security policies, and to
 <img src="site:assets/guardrails/tool-calls.svg" alt="Invariant Architecture" class="invariant-architecture" style="display: block; margin: 0 auto; width: 100%; max-width: 400pt;"/>
 <br/><br/>
 
-<div class='risks'/>
-> **Tool Calling Risks**<br/>
-> Since tools are an agent's interface to interact with the world, they can also be used to perform actions that are harmful or undesired. For example, an insecure agent could:
+!!! danger "Tool Calling Risk"
+    Since tools are an agent's interface to interact with the world, they can also be used to perform actions that are harmful or undesired. For example, an insecure agent could:
 
-> * Leak sensitive information, e.g. via a `send_email` function
+    * Leak sensitive information, e.g. via a `send_email` function
 
-> * Delete an important file, via a `delete_file` or a `bash` command
+    * Delete an important file, via a `delete_file` or a `bash` command
 
-> * Make a payment to an attacker
+    * Make a payment to an attacker
 
-> * Send a message to a user with sensitive information
+    * Send a message to a user with sensitive information
 
 To prevent tool calling related risks, Invariant offers a wide range of options to limit, constrain, validate and block tool calls. This chapter describes the different options available to you, and how to use them.