Skip to content

Commit 0c6a795

Browse files
committed
add templates
1 parent 8da5fdb commit 0c6a795

File tree

8 files changed

+422
-16
lines changed

8 files changed

+422
-16
lines changed

docs/assets/invariant.css

Lines changed: 87 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -380,13 +380,13 @@ span.llm::before {
380380
span.llm-badge::before {
381381
content: "LLM-based";
382382
color: white;
383-
font-size: 8pt;
383+
font-size: 10pt;
384384
position: relative;
385385
top: -3pt;
386386
margin-left: 3pt;
387387
background-color: rgb(199, 130, 199);
388388
display: inline-block;
389-
height: 16pt;
389+
height: 18pt;
390390

391391
padding: 2pt 4pt;
392392
border-radius: 4pt;
@@ -407,12 +407,79 @@ span.detector-badge::before {
407407
border-radius: 4pt;
408408
}
409409

410+
span.parser-badge::before {
411+
content: "Parser";
412+
color: #eef2ff;
413+
font-size: 10pt;
414+
position: relative;
415+
top: -3pt;
416+
margin-left: 3pt;
417+
background-color: #3A99FF;
418+
display: inline-block;
419+
height: 18pt;
420+
421+
padding: 2pt 4pt;
422+
border-radius: 4pt;
423+
}
424+
425+
.builtin-badge::before {
426+
content: "Builtin";
427+
color: #eef2ff;
428+
font-size: 10pt;
429+
position: relative;
430+
top: -3pt;
431+
margin-left: 3pt;
432+
background-color: #3A99FF;
433+
display: inline-block;
434+
height: 18pt;
435+
436+
padding: 2pt 4pt;
437+
border-radius: 4pt;
438+
}
439+
440+
.parser-badge[size-mod="small"]::before {
441+
font-size: 10pt;
442+
height: 16pt;
443+
padding: 0pt 3pt;
444+
top: 0pt;
445+
margin-left: 0pt;
446+
}
447+
448+
449+
.builtin-badge[size-mod="small"]::before {
450+
font-size: 10pt;
451+
height: 16pt;
452+
padding: 0pt 3pt;
453+
top: 0pt;
454+
margin-left: 0pt;
455+
}
456+
457+
410458
.detector-badge {
411459
position: relative;
412-
}
413-
414-
.detector-badge:hover::after {
460+
}
461+
462+
.detector-badge:hover::after {
415463
content: 'DETECTOR DESCRIPTION';
464+
}
465+
466+
.parser-badge {
467+
position: relative;
468+
}
469+
470+
.parser-badge:hover::after {
471+
content: 'PARSER DESCRIPTION';
472+
}
473+
474+
.builtin-badge {
475+
position: relative;
476+
}
477+
478+
.builtin-badge:hover::after {
479+
content: 'BUILTIN DESCRIPTION';
480+
}
481+
482+
.parser-badge:hover::after, .detector-badge:hover::after, .llm-badge:hover::after, .builtin-badge:hover::after {
416483
position: absolute;
417484
left: 50%;
418485
transform: translateX(-50%);
@@ -426,7 +493,7 @@ span.detector-badge::before {
426493
white-space: nowrap;
427494
z-index: 99;
428495
pointer-events: none;
429-
}
496+
}
430497

431498
.jupyter-wrapper {
432499
margin-top: -20pt;
@@ -773,7 +840,7 @@ ul.md-nav__list {
773840
/* Set minimum widths for the first two columns */
774841
.md-typeset__table th:nth-child(1),
775842
.md-typeset__table td:nth-child(1) {
776-
width: 15%;
843+
width: 22%;
777844
min-width: 100px;
778845
}
779846

@@ -786,7 +853,7 @@ ul.md-nav__list {
786853
/* Let the description column take up remaining space */
787854
.md-typeset__table th:nth-child(3),
788855
.md-typeset__table td:nth-child(3) {
789-
width: 60%;
856+
width: 50%;
790857
}
791858

792859
.function-type {
@@ -860,4 +927,16 @@ ul.md-nav__list {
860927
text-decoration: none;
861928
color: var(--md-accent-fg-color);
862929
opacity: 1.0;
930+
}
931+
932+
.boolean-value-true {
933+
color: var(--md-code-hl-keyword-color);
934+
font-weight: 500;
935+
font-family: monospace;
936+
}
937+
938+
.boolean-value-false {
939+
color: var(--md-code-hl-function-color);
940+
font-weight: 500;
941+
font-family: monospace;
863942
}

docs/guardrails/copyright.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Copyrighted Content
2+
<div class='subtitle'>
3+
{subheading}
4+
</div>
5+
6+
{introduction}
7+
<div class='risks'/>
8+
> **Copyrighted Content Risks**<br/>
9+
> Without safeguards, agents may:
10+
11+
> * {reasons}
12+
13+
{bridge}
14+
15+
## copyright <span class="detector-badge"></span>
16+
```python
17+
def copyright(
18+
data: Union[str, List[str]],
19+
) -> List[str]
20+
```
21+
Detects potentially copyrighted material in the given `data`.
22+
23+
**Parameters**
24+
25+
| Name | Type | Description |
26+
|-------------|--------|----------------------------------------|
27+
| `data` | `Union[str, List[str]]` | A single message or a list of messages. |
28+
29+
**Returns**
30+
31+
| Type | Description |
32+
|--------|----------------------------------------|
33+
| `List[str]` | List of detected copyright types. For example, `["GNU_AGPL_V3", "MIT_LICENSE", ...]`|
34+
35+
### Detecting Copyrighted content
36+
37+
**Example:** Detecting Copyrighted content
38+
```python
39+
from invariant.detectors import copyright
40+
41+
raise "found copyrighted code" if:
42+
(msg: Message)
43+
not empty(copyright(msg.content, threshold=0.75))
44+
```
45+
<div class="code-caption">{little text bit}</div>

docs/guardrails/images.md

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,87 @@ raise "Copyrighted text in image" if:
4545
(msg: Assistant)
4646
images := image(msg) # Extract all images in a single message
4747
copyright(ocr(images))
48-
```
48+
```
49+
50+
51+
## ocr <span class="parser-badge"/>
52+
```python
53+
def ocr(
54+
data: Union[str, List[str]],
55+
config: Optional[dict]
56+
) -> List[str]
57+
```
58+
Parser to extract text from images.
59+
60+
**Parameters**
61+
62+
| Name | Type | Description |
63+
|-------------|--------|----------------------------------------|
64+
| `data` | `Union[str, List[str]]` | A single base64 encoded image or a list of base64 encoded images. |
65+
66+
**Returns**
67+
68+
| Type | Description |
69+
|--------|----------------------------------------|
70+
| `List[str]` | A list of extracted pieces of text from `data`. |
71+
72+
### Analyzing Text in Images
73+
The `ocr` function is a <span class="parser-badge" size-mod="small"></span> so it returns the data found from parsing its content, in this case extracting text from an image. The extracted text can then be used for further detection, for example detecting a prompt injection in an image, like the example below.
74+
75+
**Example:** Image Prompt Injection Detection.
76+
```python
77+
from invariant.detectors import prompt_injection
78+
from invariant.parsers import ocr
79+
80+
raise "Found Prompt Injection in Image" if:
81+
(msg: Image)
82+
ocr_results := ocr(msg)
83+
prompt_injection(ocr_results)
84+
```
85+
<div class="code-caption"> The text extracted from the image can be checked using, for example, detectors.</div>
86+
87+
88+
## image <span class="builtin-badge"/>
89+
90+
```python
91+
def image(
92+
content: Union[Content | List[Content]]
93+
) -> List[Image]
94+
```
95+
Given some `Content`, this <span class="builtin-badge" size-mod="small"></span> extracts all images. This is useful when messages may contain mixed content.
96+
97+
**Parameters**
98+
99+
| Name | Type | Description |
100+
|-------------|--------|----------------------------------------|
101+
| `content` | `Union[Content | List[Content]]` | A single instance of `Content` or a list of `Content`, possibly with mixed types. |
102+
103+
**Returns**
104+
105+
| Type | Description |
106+
|--------|----------------------------------------|
107+
| `List[Image]` | A list of extracted `Image`s from `content`. |
108+
109+
110+
### Extracting Images
111+
Some policies may wish to check images and text in specific ways. Using `image` and `text` we can create a policy that detects prompt injection attacks in user input, even when we allow users to submit images.
112+
113+
**Example:** Prompt Injection Detection in Both Images and Text
114+
```python
115+
from invariant.detectors import prompt_injection
116+
from invariant.parsers import ocr
117+
118+
raise "Found Prompt Injection" if:
119+
(msg: Message)
120+
121+
# Only check user messages
122+
msg.role == 'user'
123+
124+
# Use image function to get images
125+
ocr_results := ocr(image(msg))
126+
127+
# Check both text and images
128+
prompt_injection(text(msg))
129+
prompt_injection(ocr_results)
130+
```
131+
<div class="code-caption"> Extract specific content types from mixed-content messages.</div>

docs/guardrails/moderation.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Moderated and Toxic Content
2+
<div class='subtitle'>
3+
{subheading}
4+
</div>
5+
6+
{introduction}
7+
<div class='risks'/>
8+
> **Moderated and Toxic Content Risks**<br/>
9+
> Without safeguards, agents may:
10+
11+
> * {reasons}
12+
13+
{bridge}
14+
15+
## moderated <span class="detector-badge"></span> <span class="llm-badge"/></span>
16+
```python
17+
def moderated(
18+
data: Union[str, List[str]],
19+
model: Optional[str],
20+
default_threshhold: Optional[float],
21+
cat_threshold: Optional[Dict[str, float]]
22+
) -> bool
23+
```
24+
Detector which evaluates to true if the given data should be moderated.
25+
26+
**Parameters**
27+
28+
| Name | Type | Description |
29+
|-------------|--------|----------------------------------------|
30+
| `data` | `Union[str, List[str]]` | A single message or a list of messages to detect prompt injections in. |
31+
| `model` | `Union[str, List[str]]` | The model to use for moderation detection. |
32+
| `default_threshhold` | `Optional[dict]` | The threshold for the model score above which text is considered to be moderated. |
33+
| `cat_threshhold` | `Optional[dict]` | A dictionary of [category-specific](https://platform.openai.com/docs/guides/moderation#quickstart) thresholds. |
34+
35+
**Returns**
36+
37+
| Type | Description |
38+
|--------|----------------------------------------|
39+
| `bool` | <span class='boolean-value-true'>TRUE</span> if a prompt injection was detected, <span class='boolean-value-false'>FALSE</span> otherwise |
40+
41+
### Detecting Harmful Messages
42+
To detect content that you want to moderate in messages, you can directly apply the `moderated` function to messages.
43+
44+
**Example:** Harmful Message Detection
45+
```python
46+
from invariant.detectors import moderated
47+
48+
raise "Detected a harmful message" if:
49+
(msg: Message)
50+
moderated(msg.content)
51+
```
52+
<div class="code-caption">Default moderation detection.</div>
53+
54+
55+
### Thresholding
56+
The threshold for when content is classified as requiring moderation can also be modified using the `cat_threshold` parameter.
57+
58+
**Example:** Thresholding Detection
59+
```python
60+
from invariant.detectors import moderated
61+
62+
raise "Detected a harmful message" if:
63+
(msg: Message)
64+
moderated(
65+
msg.content,
66+
cat_thresholds={"hate/threatening": 0.15}
67+
)
68+
```
69+
<div class="code-caption">Thresholding for a specific category.</div>

docs/guardrails/pii.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The `pii` function helps prevent these issues by scanning messages for PII, thus
1818
```python
1919
def pii(
2020
data: Union[str, List[str]],
21-
entities: Optional[List[str]] = None
21+
entities: Optional[List[str]]
2222
) -> List[str]
2323
```
2424
Detector to find personally indentifaible information in text.
@@ -27,7 +27,7 @@ Detector to find personally indentifaible information in text.
2727

2828
| Name | Type | Description |
2929
|-------------|--------|----------------------------------------|
30-
| `data` | `Union[str, List[str]]` | A single message or a list of messages to detect PII in |
30+
| `data` | `Union[str, List[str]]` | A single message or a list of messages to detect PII in. |
3131
| `entities` | `Optional[List[str]]` | A list of [PII entity types](https://microsoft.github.io/presidio/supported_entities/) to detect. Defaults to detecting all types. |
3232

3333
**Returns**
@@ -40,7 +40,7 @@ Detector to find personally indentifaible information in text.
4040
The simplest usage of the `pii` function is to check against any message. The following example will raise an error if any message in the trace contains PII.
4141

4242
**Example:** Detecting any PII in any message.
43-
``` py
43+
```python
4444
from invariant.detectors import pii
4545

4646
raise "Found PII in message" if:
@@ -54,7 +54,7 @@ raise "Found PII in message" if:
5454
You can also specify specific types of PII that you would like to detect, such as phone numbers, emails, or credit card information. The example below demonstrates how to detect credit card numbers in Messages.
5555

5656
**Example:** Detecting Credit Card Numbers.
57-
```guardrail
57+
```python
5858
from invariant.detectors import pii
5959

6060
raise "Found PII in message" if:
@@ -64,7 +64,7 @@ raise "Found PII in message" if:
6464
<div class="code-caption"> Only messages containing credit card numbers will raise an error. </div>
6565

6666

67-
### Preventing PII leakage
67+
### Preventing PII Leakage
6868
It is also possible to use the `pii` function in combination with other filters to get more complex behaviour. The example below shows how you can detect when an agent attempts to send emails outside of your organisation.
6969

7070
**Example:** Detecting PII Leakage in External Communications.

0 commit comments

Comments
 (0)