Skip to content

Commit 629f7f3

Browse files
authored
docs: enhance task guardrail documentation with LLM-based validation support (#3879)
- Added section on LLM-based guardrails, explaining their usage and requirements. - Updated examples to demonstrate the implementation of multiple guardrails, including both function-based and LLM-based approaches. - Clarified the distinction between single and multiple guardrails in task configurations. - Improved explanations of guardrail functionality to ensure better understanding of validation processes.
1 parent 0f1c173 commit 629f7f3

File tree

1 file changed

+149
-3
lines changed

1 file changed

+149
-3
lines changed

docs/en/concepts/tasks.mdx

Lines changed: 149 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ crew = Crew(
6060
| **Output Pydantic** _(optional)_ | `output_pydantic` | `Optional[Type[BaseModel]]` | A Pydantic model for task output. |
6161
| **Callback** _(optional)_ | `callback` | `Optional[Any]` | Function/object to be executed after task completion. |
6262
| **Guardrail** _(optional)_ | `guardrail` | `Optional[Callable]` | Function to validate task output before proceeding to next task. |
63+
| **Guardrails** _(optional)_ | `guardrails` | `Optional[List[Callable] | List[str]]` | List of guardrails to validate task output before proceeding to next task. |
6364
| **Guardrail Max Retries** _(optional)_ | `guardrail_max_retries` | `Optional[int]` | Maximum number of retries when guardrail validation fails. Defaults to 3. |
6465

6566
<Note type="warning" title="Deprecated: max_retries">
@@ -341,7 +342,11 @@ Task guardrails provide a way to validate and transform task outputs before they
341342
are passed to the next task. This feature helps ensure data quality and provides
342343
feedback to agents when their output doesn't meet specific criteria.
343344

344-
Guardrails are implemented as Python functions that contain custom validation logic, giving you complete control over the validation process and ensuring reliable, deterministic results.
345+
CrewAI supports two types of guardrails:
346+
347+
1. **Function-based guardrails**: Python functions with custom validation logic, giving you complete control over the validation process and ensuring reliable, deterministic results.
348+
349+
2. **LLM-based guardrails**: String descriptions that use the agent's LLM to validate outputs based on natural language criteria. These are ideal for complex or subjective validation requirements.
345350

346351
### Function-Based Guardrails
347352

@@ -355,12 +360,12 @@ def validate_blog_content(result: TaskOutput) -> Tuple[bool, Any]:
355360
"""Validate blog content meets requirements."""
356361
try:
357362
# Check word count
358-
word_count = len(result.split())
363+
word_count = len(result.raw.split())
359364
if word_count > 200:
360365
return (False, "Blog content exceeds 200 words")
361366
362367
# Additional validation logic here
363-
return (True, result.strip())
368+
return (True, result.raw.strip())
364369
except Exception as e:
365370
return (False, "Unexpected error during validation")
366371
@@ -372,6 +377,147 @@ blog_task = Task(
372377
)
373378
```
374379

380+
### LLM-Based Guardrails (String Descriptions)
381+
382+
Instead of writing custom validation functions, you can use string descriptions that leverage LLM-based validation. When you provide a string to the `guardrail` or `guardrails` parameter, CrewAI automatically creates an `LLMGuardrail` that uses the agent's LLM to validate the output based on your description.
383+
384+
**Requirements**:
385+
- The task must have an `agent` assigned (the guardrail uses the agent's LLM)
386+
- Provide a clear, descriptive string explaining the validation criteria
387+
388+
```python Code
389+
from crewai import Task
390+
391+
# Single LLM-based guardrail
392+
blog_task = Task(
393+
description="Write a blog post about AI",
394+
expected_output="A blog post under 200 words",
395+
agent=blog_agent,
396+
guardrail="The blog post must be under 200 words and contain no technical jargon"
397+
)
398+
```
399+
400+
LLM-based guardrails are particularly useful for:
401+
- **Complex validation logic** that's difficult to express programmatically
402+
- **Subjective criteria** like tone, style, or quality assessments
403+
- **Natural language requirements** that are easier to describe than code
404+
405+
The LLM guardrail will:
406+
1. Analyze the task output against your description
407+
2. Return `(True, output)` if the output complies with the criteria
408+
3. Return `(False, feedback)` with specific feedback if validation fails
409+
410+
**Example with detailed validation criteria**:
411+
412+
```python Code
413+
research_task = Task(
414+
description="Research the latest developments in quantum computing",
415+
expected_output="A comprehensive research report",
416+
agent=researcher_agent,
417+
guardrail="""
418+
The research report must:
419+
- Be at least 1000 words long
420+
- Include at least 5 credible sources
421+
- Cover both technical and practical applications
422+
- Be written in a professional, academic tone
423+
- Avoid speculation or unverified claims
424+
"""
425+
)
426+
```
427+
428+
### Multiple Guardrails
429+
430+
You can apply multiple guardrails to a task using the `guardrails` parameter. Multiple guardrails are executed sequentially, with each guardrail receiving the output from the previous one. This allows you to chain validation and transformation steps.
431+
432+
The `guardrails` parameter accepts:
433+
- A list of guardrail functions or string descriptions
434+
- A single guardrail function or string (same as `guardrail`)
435+
436+
**Note**: If `guardrails` is provided, it takes precedence over `guardrail`. The `guardrail` parameter will be ignored when `guardrails` is set.
437+
438+
```python Code
439+
from typing import Tuple, Any
440+
from crewai import TaskOutput, Task
441+
442+
def validate_word_count(result: TaskOutput) -> Tuple[bool, Any]:
443+
"""Validate word count is within limits."""
444+
word_count = len(result.raw.split())
445+
if word_count < 100:
446+
return (False, f"Content too short: {word_count} words. Need at least 100 words.")
447+
if word_count > 500:
448+
return (False, f"Content too long: {word_count} words. Maximum is 500 words.")
449+
return (True, result.raw)
450+
451+
def validate_no_profanity(result: TaskOutput) -> Tuple[bool, Any]:
452+
"""Check for inappropriate language."""
453+
profanity_words = ["badword1", "badword2"] # Example list
454+
content_lower = result.raw.lower()
455+
for word in profanity_words:
456+
if word in content_lower:
457+
return (False, f"Inappropriate language detected: {word}")
458+
return (True, result.raw)
459+
460+
def format_output(result: TaskOutput) -> Tuple[bool, Any]:
461+
"""Format and clean the output."""
462+
formatted = result.raw.strip()
463+
# Capitalize first letter
464+
formatted = formatted[0].upper() + formatted[1:] if formatted else formatted
465+
return (True, formatted)
466+
467+
# Apply multiple guardrails sequentially
468+
blog_task = Task(
469+
description="Write a blog post about AI",
470+
expected_output="A well-formatted blog post between 100-500 words",
471+
agent=blog_agent,
472+
guardrails=[
473+
validate_word_count, # First: validate length
474+
validate_no_profanity, # Second: check content
475+
format_output # Third: format the result
476+
],
477+
guardrail_max_retries=3
478+
)
479+
```
480+
481+
In this example, the guardrails execute in order:
482+
1. `validate_word_count` checks the word count
483+
2. `validate_no_profanity` checks for inappropriate language (using the output from step 1)
484+
3. `format_output` formats the final result (using the output from step 2)
485+
486+
If any guardrail fails, the error is sent back to the agent, and the task is retried up to `guardrail_max_retries` times.
487+
488+
**Mixing function-based and LLM-based guardrails**:
489+
490+
You can combine both function-based and string-based guardrails in the same list:
491+
492+
```python Code
493+
from typing import Tuple, Any
494+
from crewai import TaskOutput, Task
495+
496+
def validate_word_count(result: TaskOutput) -> Tuple[bool, Any]:
497+
"""Validate word count is within limits."""
498+
word_count = len(result.raw.split())
499+
if word_count < 100:
500+
return (False, f"Content too short: {word_count} words. Need at least 100 words.")
501+
if word_count > 500:
502+
return (False, f"Content too long: {word_count} words. Maximum is 500 words.")
503+
return (True, result.raw)
504+
505+
# Mix function-based and LLM-based guardrails
506+
blog_task = Task(
507+
description="Write a blog post about AI",
508+
expected_output="A well-formatted blog post between 100-500 words",
509+
agent=blog_agent,
510+
guardrails=[
511+
validate_word_count, # Function-based: precise word count check
512+
"The content must be engaging and suitable for a general audience", # LLM-based: subjective quality check
513+
"The writing style should be clear, concise, and free of technical jargon" # LLM-based: style validation
514+
],
515+
guardrail_max_retries=3
516+
)
517+
```
518+
519+
This approach combines the precision of programmatic validation with the flexibility of LLM-based assessment for subjective criteria.
520+
375521
### Guardrail Function Requirements
376522

377523
1. **Function Signature**:

0 commit comments

Comments
 (0)