|
1 |
| -# OpenAI Guardrails |
2 |
| - |
3 |
| -## Overview |
4 |
| - |
5 |
| -OpenAI Guardrails is a Python package for adding robust, configurable safety and compliance guardrails to LLM applications. It provides a drop-in wrapper for OpenAI's Python client, enabling automatic input/output validation and moderation using a wide range of guardrails. |
6 |
| - |
7 |
| -## Documentation |
8 |
| - |
9 |
| -For full details, advanced usage, and API reference, see here: [OpenAI Guardrails Documentation](https://openai.github.io/openai-guardrails-python/). |
10 |
| - |
11 |
| -## Quick Start: Using OpenAI Guardrails (Python) |
12 |
| - |
13 |
| -1. **Generate your guardrail spec JSON** |
14 |
| - - Use the [Guardrails web UI](https://guardrails.openai.com/) to create a JSON configuration file describing which guardrails to apply and how to configure them. |
15 |
| - - The wizard outputs a file like `guardrail_specs.json`. |
16 |
| - |
17 |
| -2. **Install** |
18 |
| - ```bash |
19 |
| - pip install openai-guardrails |
20 |
| - ``` |
21 |
| - |
22 |
| -3. **Wrap your OpenAI client with Guardrails** |
23 |
| - ```python |
24 |
| - from guardrails import GuardrailsOpenAI, GuardrailTripwireTriggered |
25 |
| - from pathlib import Path |
26 |
| -
|
27 |
| - # guardrail_config.json is generated by the configuration wizard |
28 |
| - client = GuardrailsOpenAI(config=Path("guardrail_config.json")) |
29 |
| -
|
30 |
| - # Use as you would the OpenAI client, but handle guardrail exceptions |
31 |
| - try: |
32 |
| - response = client.chat.completions.create( |
33 |
| - model="gpt-5", |
34 |
| - messages=[{"role": "user", "content": "..."}], |
35 |
| - ) |
36 |
| - print(response.llm_response.choices[0].message.content) |
37 |
| - except GuardrailTripwireTriggered as e: |
38 |
| - # Handle blocked or flagged content |
39 |
| - print(f"Guardrail triggered: {e}") |
40 |
| - # --- |
41 |
| - # Example: Using the new OpenAI Responses API with Guardrails |
42 |
| - try: |
43 |
| - resp = client.responses.create( |
44 |
| - model="gpt-5", |
45 |
| - input="What are the main features of your premium plan?", |
46 |
| - # Optionally, add file_search or other tool arguments as needed |
47 |
| - ) |
48 |
| - print(resp.llm_response.output_text) |
49 |
| - except GuardrailTripwireTriggered as e: |
50 |
| - print(f"Guardrail triggered (responses API): {e}") |
51 |
| - ``` |
52 |
| - - The client will automatically apply all configured guardrails to inputs and outputs. |
53 |
| - - If a guardrail is triggered, a `GuardrailTripwireTriggered` exception will be raised. You should handle this exception to gracefully manage blocked or flagged content. |
54 |
| - |
55 |
| -> **Note:** The Guardrails web UI is hosted [here](https://guardrails.openai.com/). You do not need to run the web UI yourself to use the Python package. |
56 |
| - |
57 |
| ---- |
58 |
| - |
59 |
| -## What Does the Python Package Provide? |
60 |
| - |
61 |
| -- **GuardrailsOpenAI** and **GuardrailsAsyncOpenAI**: Drop-in replacements for OpenAI's `OpenAI` and `AsyncOpenAI` clients, with automatic guardrail enforcement. |
62 |
| -- **GuardrailsAzureOpenAI** and **GuardrailsAsyncAzureOpenAI**: Drop-in replacements for Azure OpenAI clients, with the same guardrail support. (See the documentation for details.) |
63 |
| -- **Automatic input/output validation**: Guardrails are applied to all relevant API calls (e.g., `chat.completions.create`, `responses.create`, etc.). |
64 |
| -- **Configurable guardrails**: Choose which checks to enable, and customize their parameters via the JSON spec. |
65 |
| -- **Tripwire support**: Optionally block or mask unsafe content, or just log/flag it for review. |
66 |
| -
|
67 |
| ---- |
| 1 | +# OpenAI Guardrails: Python |
| 2 | + |
| 3 | +This is the Python version of OpenAI Guardrails, a package for adding configurable safety and compliance guardrails to LLM applications. It provides a drop-in wrapper for OpenAI's Python client, enabling automatic input/output validation and moderation using a wide range of guardrails. |
| 4 | + |
| 5 | +Most users can simply follow the guided configuration and installation instructions at [guardrails.openai.com](https://guardrails.openai.com/). |
| 6 | + |
| 7 | +## Installation |
| 8 | + |
| 9 | +### Usage |
| 10 | + |
| 11 | +Follow the configuration and installation instructions at [guardrails.openai.com](https://guardrails.openai.com/). |
| 12 | + |
| 13 | +### Local Development |
| 14 | + |
| 15 | +Clone the repository and install locally: |
| 16 | + |
| 17 | +```bash |
| 18 | +# Clone the repository |
| 19 | +git clone https://github.com/openai/openai-guardrails-python.git |
| 20 | +cd openai-guardrails-python |
| 21 | + |
| 22 | +# Install the package (editable), plus example extras if desired |
| 23 | +pip install -e . |
| 24 | +pip install -e ".[examples]" |
| 25 | +``` |
| 26 | + |
| 27 | +## Integration Details |
| 28 | + |
| 29 | +### Drop-in OpenAI Replacement |
| 30 | + |
| 31 | +The easiest way to use Guardrails Python is as a drop-in replacement for the OpenAI client: |
| 32 | + |
| 33 | +```python |
| 34 | +from pathlib import Path |
| 35 | +from guardrails import GuardrailsOpenAI, GuardrailTripwireTriggered |
| 36 | + |
| 37 | +# Use GuardrailsOpenAI instead of OpenAI |
| 38 | +client = GuardrailsOpenAI(config=Path("guardrail_config.json")) |
| 39 | + |
| 40 | +try: |
| 41 | + # Works with standard Chat Completions |
| 42 | + chat = client.chat.completions.create( |
| 43 | + model="gpt-5", |
| 44 | + messages=[{"role": "user", "content": "Hello world"}], |
| 45 | + ) |
| 46 | + print(chat.llm_response.choices[0].message.content) |
| 47 | + |
| 48 | + # Or with the Responses API |
| 49 | + resp = client.responses.create( |
| 50 | + model="gpt-5", |
| 51 | + input="What are the main features of your premium plan?", |
| 52 | + ) |
| 53 | + print(resp.llm_response.output_text) |
| 54 | +except GuardrailTripwireTriggered as e: |
| 55 | + print(f"Guardrail triggered: {e}") |
| 56 | +``` |
| 57 | + |
| 58 | +### Agents SDK Integration |
| 59 | + |
| 60 | +You can integrate guardrails with the OpenAI Agents SDK via `GuardrailAgent`: |
| 61 | + |
| 62 | +```python |
| 63 | +import asyncio |
| 64 | +from pathlib import Path |
| 65 | +from agents import InputGuardrailTripwireTriggered, OutputGuardrailTripwireTriggered, Runner |
| 66 | +from agents.run import RunConfig |
| 67 | +from guardrails import GuardrailAgent |
| 68 | + |
| 69 | +# Create agent with guardrails automatically configured |
| 70 | +agent = GuardrailAgent( |
| 71 | + config=Path("guardrails_config.json"), |
| 72 | + name="Customer support agent", |
| 73 | + instructions="You are a customer support agent. You help customers with their questions.", |
| 74 | +) |
| 75 | + |
| 76 | +async def main(): |
| 77 | + try: |
| 78 | + result = await Runner.run(agent, "Hello, can you help me?", run_config=RunConfig(tracing_disabled=True)) |
| 79 | + print(result.final_output) |
| 80 | + except (InputGuardrailTripwireTriggered, OutputGuardrailTripwireTriggered): |
| 81 | + print("🛑 Guardrail triggered!") |
| 82 | + |
| 83 | +if __name__ == "__main__": |
| 84 | + asyncio.run(main()) |
| 85 | +``` |
| 86 | + |
| 87 | +> For more details, see [`docs/agents_sdk_integration.md`](./docs/agents_sdk_integration.md). |
| 88 | +
|
| 89 | +## Evaluation Framework |
| 90 | + |
| 91 | +Evaluate guardrail performance on labeled datasets and run benchmarks. |
| 92 | + |
| 93 | +### Running Evaluations |
| 94 | + |
| 95 | +```bash |
| 96 | +# Basic evaluation |
| 97 | +python -m guardrails.evals.guardrail_evals \ |
| 98 | + --config-path guardrails_config.json \ |
| 99 | + --dataset-path data.jsonl |
| 100 | + |
| 101 | +# Benchmark mode (compare models, generate ROC curves, latency) |
| 102 | +python -m guardrails.evals.guardrail_evals \ |
| 103 | + --config-path guardrails_config.json \ |
| 104 | + --dataset-path data.jsonl \ |
| 105 | + --mode benchmark \ |
| 106 | + --models gpt-5 gpt-5-mini gpt-4.1-mini |
| 107 | +``` |
| 108 | + |
| 109 | +### Dataset Format |
| 110 | + |
| 111 | +Datasets must be in JSONL format, with each line containing a JSON object: |
| 112 | + |
| 113 | +```json |
| 114 | +{ |
| 115 | + "id": "sample_1", |
| 116 | + "data": "Text or conversation to evaluate", |
| 117 | + "expected_triggers": { |
| 118 | + "Moderation": true, |
| 119 | + "NSFW Text": false |
| 120 | + } |
| 121 | +} |
| 122 | +``` |
| 123 | + |
| 124 | +### Programmatic Usage |
| 125 | + |
| 126 | +```python |
| 127 | +from pathlib import Path |
| 128 | +from guardrails.evals.guardrail_evals import GuardrailEval |
| 129 | + |
| 130 | +eval = GuardrailEval( |
| 131 | + config_path=Path("guardrails_config.json"), |
| 132 | + dataset_path=Path("data.jsonl"), |
| 133 | + batch_size=32, |
| 134 | + output_dir=Path("results"), |
| 135 | +) |
| 136 | + |
| 137 | +import asyncio |
| 138 | +asyncio.run(eval.run()) |
| 139 | +``` |
| 140 | + |
| 141 | +### Project Structure |
| 142 | + |
| 143 | +- `src/guardrails/` - Python source code |
| 144 | +- `src/guardrails/checks/` - Built-in guardrail checks |
| 145 | +- `src/guardrails/evals/` - Evaluation framework |
| 146 | +- `examples/` - Example usage and sample configs |
| 147 | + |
| 148 | +## Examples |
| 149 | + |
| 150 | +The package includes examples in the [`examples/` directory](./examples): |
| 151 | + |
| 152 | +- `examples/basic/hello_world.py` — Basic chatbot with guardrails using `GuardrailsOpenAI` |
| 153 | +- `examples/basic/agents_sdk.py` — Agents SDK integration with `GuardrailAgent` |
| 154 | +- `examples/basic/local_model.py` — Using local models with guardrails |
| 155 | +- `examples/basic/structured_outputs_example.py` — Structured outputs |
| 156 | +- `examples/basic/pii_mask_example.py` — PII masking |
| 157 | +- `examples/basic/suppress_tripwire.py` — Handling violations gracefully |
| 158 | + |
| 159 | +### Running Examples |
| 160 | + |
| 161 | +#### Prerequisites |
| 162 | + |
| 163 | +```bash |
| 164 | +pip install -e . |
| 165 | +pip install "openai-guardrails[examples]" |
| 166 | +``` |
| 167 | + |
| 168 | +#### Run |
| 169 | + |
| 170 | +```bash |
| 171 | +python examples/basic/hello_world.py |
| 172 | +python examples/basic/agents_sdk.py |
| 173 | +``` |
68 | 174 |
|
69 | 175 | ## Available Guardrails
|
70 | 176 |
|
71 |
| -Below is a list of all built-in guardrails you can configure. Each can be enabled/disabled and customized in your JSON spec. |
| 177 | +The Python implementation includes the following built-in guardrails: |
72 | 178 |
|
73 |
| -| Guardrail Name | Description | |
74 |
| -|-------------------------|-------------| |
75 |
| -| **Keyword Filter** | Triggers when any keyword appears in text. | |
76 |
| -| **Competitors** | Checks if the model output mentions any competitors from the provided list. | |
77 |
| -| **Jailbreak** | Detects attempts to jailbreak or bypass AI safety measures using techniques such as prompt injection, role-playing requests, system prompt overrides, or social engineering. | |
78 |
| -| **Moderation** | Flags text containing disallowed content categories (e.g., hate, violence, sexual, etc.) using OpenAI's moderation API. | |
79 |
| -| **NSFW Text** | Detects NSFW (Not Safe For Work) content in text, including sexual content, hate speech, violence, profanity, illegal activities, and other inappropriate material. | |
80 |
| -| **Contains PII** | Checks that the text does not contain personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, etc., based on configured entity types. | |
81 |
| -| **Secret Keys** | Checks that the text does not contain potential API keys, secrets, or other credentials. | |
82 |
| -| **Off Topic Prompts** | Checks that the content stays within the defined business scope. | |
83 |
| -| **URL Filter** | Flags URLs in the text unless they match entries in the allow list. | |
84 |
| -| **Custom Prompt Check** | Runs a user-defined guardrail based on a custom system prompt. Allows for flexible content moderation based on specific requirements. | |
85 |
| -| **Anti-Hallucination** | Detects potential hallucinations in AI-generated text using OpenAI Responses API with file search. Validates claims against actual documents and flags factually incorrect, unsupported, or potentially fabricated information. | |
| 179 | +- **Moderation**: Content moderation using OpenAI's moderation API |
| 180 | +- **URL Filter**: URL filtering and domain allowlist/blocklist |
| 181 | +- **Contains PII**: Personally Identifiable Information detection |
| 182 | +- **Hallucination Detection**: Detects hallucinated content using vector stores |
| 183 | +- **Jailbreak**: Detects jailbreak attempts |
| 184 | +- **NSFW Text**: Detects workplace-inappropriate content in model outputs |
| 185 | +- **Off Topic Prompts**: Ensures responses stay within business scope |
| 186 | +- **Custom Prompt Check**: Custom LLM-based guardrails |
86 | 187 |
|
87 |
| ---- |
| 188 | +For full details, advanced usage, and API reference, see: [OpenAI Guardrails Documentation](https://openai.github.io/openai-guardrails-python/). |
88 | 189 |
|
89 | 190 | ## License
|
90 | 191 |
|
91 |
| -For the duration of this early access alpha, `guardrails` is distributed under the Alpha Evaluation Agreement that your organization signed with OpenAI. |
92 |
| - |
93 |
| -The Python package is intended to be MIT-licensed in the future, subject to change. |
| 192 | +MIT License - see LICENSE file for details. |
94 | 193 |
|
95 | 194 | ## Disclaimers
|
96 | 195 |
|
|
0 commit comments