Skip to content

Commit fbde169

Browse files
fix(langchain): improvements to PII middleware docs (#1413)
cc @sydney-runkle Some additions to the PII middleware docs: - fixed JS code examples - added section how to create custom strategies - removed "Full example" accordion, as it contained duplicate content
1 parent ff2a9bb commit fbde169

File tree

1 file changed

+192
-65
lines changed

1 file changed

+192
-65
lines changed

src/oss/langchain/middleware/built-in.mdx

Lines changed: 192 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -792,27 +792,202 @@ agent = create_agent(
792792

793793
:::js
794794
```typescript
795-
import { createAgent, piiRedactionMiddleware } from "langchain";
795+
import { createAgent, piiMiddleware } from "langchain";
796796

797797
const agent = createAgent({
798798
model: "gpt-4o",
799799
tools: [...],
800800
middleware: [
801-
piiRedactionMiddleware({
802-
piiType: "email",
803-
strategy: "redact",
804-
applyToInput: true,
801+
piiMiddleware("email", { strategy: "redact", applyToInput: true }),
802+
piiMiddleware("credit_card", { strategy: "mask", applyToInput: true }),
803+
],
804+
});
805+
```
806+
:::
807+
808+
#### Custom PII types
809+
810+
You can create custom PII types by providing a `detector` parameter. This allows you to detect patterns specific to your use case beyond the built-in types.
811+
812+
**Three ways to create custom detectors:**
813+
814+
1. **Regex pattern string** - Simple pattern matching
815+
:::js
816+
1. **RegExp object** - More control over regex flags
817+
:::
818+
1. **Custom function** - Complex detection logic with validation
819+
820+
:::python
821+
```python
822+
from langchain.agents import create_agent
823+
from langchain.agents.middleware import PIIMiddleware
824+
import re
825+
826+
827+
# Method 1: Regex pattern string
828+
agent1 = create_agent(
829+
model="gpt-4o",
830+
tools=[...],
831+
middleware=[
832+
PIIMiddleware(
833+
"api_key",
834+
detector=r"sk-[a-zA-Z0-9]{32}",
835+
strategy="block",
836+
),
837+
],
838+
)
839+
840+
# Method 2: Compiled regex pattern
841+
agent2 = create_agent(
842+
model="gpt-4o",
843+
tools=[...],
844+
middleware=[
845+
PIIMiddleware(
846+
"phone_number",
847+
detector=re.compile(r"\+?\d{1,3}[\s.-]?\d{3,4}[\s.-]?\d{4}"),
848+
strategy="mask",
849+
),
850+
],
851+
)
852+
853+
# Method 3: Custom detector function
854+
def detect_ssn(content: str) -> list[dict[str, str | int]]:
855+
"""Detect SSN with validation.
856+
857+
Returns a list of dictionaries with 'text', 'start', and 'end' keys.
858+
"""
859+
import re
860+
matches = []
861+
pattern = r"\d{3}-\d{2}-\d{4}"
862+
for match in re.finditer(pattern, content):
863+
ssn = match.group(0)
864+
# Validate: first 3 digits shouldn't be 000, 666, or 900-999
865+
first_three = int(ssn[:3])
866+
if first_three not in [0, 666] and not (900 <= first_three <= 999):
867+
matches.append({
868+
"text": ssn,
869+
"start": match.start(),
870+
"end": match.end(),
871+
})
872+
return matches
873+
874+
agent3 = create_agent(
875+
model="gpt-4o",
876+
tools=[...],
877+
middleware=[
878+
PIIMiddleware(
879+
"ssn",
880+
detector=detect_ssn,
881+
strategy="hash",
882+
),
883+
],
884+
)
885+
```
886+
:::
887+
888+
:::js
889+
```typescript
890+
import { createAgent, piiMiddleware, type PIIMatch } from "langchain";
891+
892+
// Method 1: Regex pattern string
893+
const agent1 = createAgent({
894+
model: "gpt-4o",
895+
tools: [...],
896+
middleware: [
897+
piiMiddleware("api_key", {
898+
detector: "sk-[a-zA-Z0-9]{32}",
899+
strategy: "block",
805900
}),
806-
piiRedactionMiddleware({
807-
piiType: "credit_card",
901+
],
902+
});
903+
904+
// Method 2: RegExp object
905+
const agent2 = createAgent({
906+
model: "gpt-4o",
907+
tools: [...],
908+
middleware: [
909+
piiMiddleware("phone_number", {
910+
detector: /\+?\d{1,3}[\s.-]?\d{3,4}[\s.-]?\d{4}/,
808911
strategy: "mask",
809-
applyToInput: true,
810912
}),
811913
],
812914
});
915+
916+
// Method 3: Custom detector function
917+
function detectSSN(content: string): PIIMatch[] {
918+
const matches: PIIMatch[] = [];
919+
const pattern = /\d{3}-\d{2}-\d{4}/g;
920+
let match: RegExpExecArray | null;
921+
922+
while ((match = pattern.exec(content)) !== null) {
923+
const ssn = match[0];
924+
// Validate: first 3 digits shouldn't be 000, 666, or 900-999
925+
const firstThree = parseInt(ssn.substring(0, 3), 10);
926+
if (firstThree !== 0 && firstThree !== 666 && !(firstThree >= 900 && firstThree <= 999)) {
927+
matches.push({
928+
text: ssn,
929+
start: match.index ?? 0,
930+
end: (match.index ?? 0) + ssn.length,
931+
});
932+
}
933+
}
934+
return matches;
935+
}
936+
937+
const agent3 = createAgent({
938+
model: "gpt-4o",
939+
tools: [...],
940+
middleware: [
941+
piiMiddleware("ssn", {
942+
detector: detectSSN,
943+
strategy: "hash",
944+
}),
945+
],
946+
});
947+
```
948+
:::
949+
950+
**Custom detector function signature:**
951+
952+
The detector function must accept a string (content) and return matches:
953+
954+
:::python
955+
Returns a list of dictionaries with `text`, `start`, and `end` keys:
956+
```python
957+
def detector(content: str) -> list[dict[str, str | int]]:
958+
return [
959+
{"text": "matched_text", "start": 0, "end": 12},
960+
# ... more matches
961+
]
962+
```
963+
:::
964+
:::js
965+
Returns an array of `PIIMatch` objects:
966+
```typescript
967+
interface PIIMatch {
968+
text: string; // The matched text
969+
start: number; // Start index in content
970+
end: number; // End index in content
971+
}
972+
973+
function detector(content: string): PIIMatch[] {
974+
return [
975+
{ text: "matched_text", start: 0, end: 12 },
976+
// ... more matches
977+
];
978+
}
813979
```
814980
:::
815981

982+
<Tip>
983+
For custom detectors:
984+
985+
- Use regex strings for simple patterns
986+
- Use RegExp objects when you need flags (e.g., case-insensitive matching)
987+
- Use custom functions when you need validation logic beyond pattern matching
988+
- Custom functions give you full control over detection logic and can implement complex validation rules
989+
</Tip>
990+
816991
<Accordion title="Configuration options">
817992

818993
:::python
@@ -857,11 +1032,17 @@ const agent = createAgent({
8571032
- `'block'` - Throw error when detected
8581033
- `'redact'` - Replace with `[REDACTED_TYPE]`
8591034
- `'mask'` - Partially mask (e.g., `****-****-****-1234`)
860-
- `'hash'` - Replace with deterministic hash
1035+
- `'hash'` - Replace with deterministic hash (e.g., `<email_hash:a1b2c3d4>`)
8611036
</ParamField>
8621037

863-
<ParamField body="detector" type="RegExp">
864-
Custom detector regex pattern. If not provided, uses built-in detector for the PII type.
1038+
<ParamField body="detector" type="RegExp | string | function">
1039+
Custom detector. Can be:
1040+
1041+
- `RegExp` - Regex pattern for matching
1042+
- `string` - Regex pattern string (e.g., `"sk-[a-zA-Z0-9]{32}"`)
1043+
- `function` - Custom detector function `(content: string) => PIIMatch[]`
1044+
1045+
If not provided, uses built-in detector for the PII type.
8651046
</ParamField>
8661047

8671048
<ParamField body="applyToInput" type="boolean" default="true">
@@ -879,60 +1060,6 @@ const agent = createAgent({
8791060

8801061
</Accordion>
8811062

882-
<Accordion title="Full example">
883-
884-
The middleware supports detecting built-in PII types (`email`, `credit_card`, `ip`, `mac_address`, `url`) or custom types with regex patterns.
885-
886-
**Detection strategies:**
887-
- `'block'` - Raise exception when detected
888-
- `'redact'` - Replace with `[REDACTED_TYPE]`
889-
- `'mask'` - Partially mask (e.g., `****-****-****-1234`)
890-
- `'hash'` - Replace with deterministic hash
891-
892-
**Application scope:**
893-
- `apply_to_input` - Check user messages before model call
894-
- `apply_to_output` - Check AI messages after model call
895-
- `apply_to_tool_results` - Check tool result messages after execution
896-
897-
:::python
898-
```python
899-
from langchain.agents import create_agent
900-
from langchain.agents.middleware import PIIMiddleware
901-
902-
903-
agent = create_agent(
904-
model="gpt-4o",
905-
tools=[database_tool, email_tool],
906-
middleware=[
907-
PIIMiddleware("email", strategy="redact", apply_to_input=True),
908-
PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
909-
PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy="block"),
910-
PIIMiddleware("ssn", detector=r"\d{3}-\d{2}-\d{4}", strategy="hash", apply_to_tool_results=True),
911-
],
912-
)
913-
```
914-
:::
915-
916-
:::js
917-
```typescript
918-
import { createAgent, piiRedactionMiddleware } from "langchain";
919-
920-
const agent = createAgent({
921-
model: "gpt-4o",
922-
tools: [databaseTool, emailTool],
923-
middleware: [
924-
piiRedactionMiddleware({ piiType: "email", strategy: "redact", applyToInput: true }),
925-
piiRedactionMiddleware({ piiType: "credit_card", strategy: "mask", applyToInput: true }),
926-
piiRedactionMiddleware({ piiType: "api_key", detector: /sk-[a-zA-Z0-9]{32}/, strategy: "block" }),
927-
piiRedactionMiddleware({ piiType: "ssn", detector: /\d{3}-\d{2}-\d{4}/, strategy: "hash", applyToToolResults: true }),
928-
],
929-
});
930-
```
931-
:::
932-
933-
</Accordion>
934-
935-
9361063
### To-do list
9371064

9381065
Equip agents with task planning and tracking capabilities for complex multi-step tasks. To-do lists are useful for the following:

0 commit comments

Comments
 (0)