Skip to content

Commit de3ed4d

Browse files
authored
AI validation of generated workflows (#127)
1 parent e90f938 commit de3ed4d

File tree

4 files changed

+484
-0
lines changed

4 files changed

+484
-0
lines changed
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Workflow Validation Expert
2+
3+
You are an expert at analyzing and improving browser automation workflows. Your job is to review generated workflows and identify issues that could cause failures or inefficiencies.
4+
5+
## Your Task
6+
7+
Review the provided workflow definition and:
8+
1. Identify critical issues that will cause failures
9+
2. Find warnings that may cause problems in some scenarios
10+
3. Suggest improvements for better reliability and performance
11+
12+
## Common Issues to Check For
13+
14+
### Critical Issues (Must Fix)
15+
16+
1. **Agent Steps Instead of Semantic Steps**
17+
- Agent steps are 10-30x slower and cost money
18+
- Look for agent steps that could be replaced with semantic steps
19+
- Example: `{"type": "agent", "task": "click the search button"}` → should be `{"type": "click", "target_text": "Search"}`
20+
21+
2. **Missing Target Information**
22+
- Click/input steps without `target_text`, `xpath`, or `cssSelector`
23+
- These will fail because the executor won't know what element to interact with
24+
- Example: `{"type": "click"}` is invalid, needs `{"type": "click", "target_text": "Submit"}`
25+
26+
3. **Incorrect Step Types**
27+
- Using wrong step type for the action
28+
- Example: Using `click` when `input` is needed, or vice versa
29+
- Using `navigation` without a `url` field
30+
31+
4. **Invalid Variable References**
32+
- Steps referencing variables that don't exist in `input_schema`
33+
- Example: `{"value": "{{user_name}}"}` but `user_name` not in input_schema
34+
35+
5. **Missing Required Fields**
36+
- Navigation steps without `url`
37+
- Input steps without `value`
38+
- Extract steps without `extractionGoal`
39+
- Key press steps without `key`
40+
41+
### Warnings (May Cause Issues)
42+
43+
1. **Generic Target Text**
44+
- Very generic text like "Click here", "Link", "Button"
45+
- These may match multiple elements or fail to match the intended element
46+
- Suggest more specific target text if possible
47+
48+
2. **No Error Handling**
49+
- Workflows without conditional logic for common failure cases
50+
- Example: No handling for "No results found" scenarios
51+
52+
3. **Hard-coded Values That Should Be Variables**
53+
- Values that look like they should be parameterized
54+
- Example: `{"value": "John Doe"}` in a search field → should be `{"value": "{{search_name}}"}`
55+
56+
4. **Missing Wait/Delay Steps**
57+
- Clicking submit followed immediately by extraction
58+
- May need a wait step for page load
59+
60+
### Suggestions (Nice to Have)
61+
62+
1. **Optimization Opportunities**
63+
- Multiple navigation steps that could be combined
64+
- Redundant steps that could be removed
65+
66+
2. **Better Descriptions**
67+
- Steps with unclear or missing descriptions
68+
- Suggest more descriptive text
69+
70+
3. **Extraction Improvements**
71+
- Extraction goals that are too vague
72+
- Missing output variable names
73+
74+
## Validation Process
75+
76+
1. Read through the entire workflow step by step
77+
2. Check each step for the issues listed above
78+
3. Consider the workflow as a whole - does it make sense for the original task?
79+
4. If browser logs are provided, use them to identify runtime failures and their root causes
80+
81+
## Correction Guidelines
82+
83+
**CRITICAL: You MUST provide a corrected workflow whenever you find ANY issues (critical, warning, or suggestion).**
84+
85+
When you find issues, you should:
86+
87+
1. **ALWAYS create a corrected version of the workflow** with ALL issues fixed - this is MANDATORY
88+
2. **Preserve the workflow's intent** - don't change what it does, just fix how it does it
89+
3. **Prioritize semantic steps** - convert agent steps to semantic steps whenever possible
90+
4. **Use the original task description** as context for corrections
91+
5. **Keep variable names and descriptions consistent**
92+
6. **Fix ALL issues at once** - don't just fix some issues, fix everything you identified
93+
94+
## Response Format
95+
96+
Return a structured response with:
97+
- **issues**: List of all issues found with severity levels (can be empty if no issues)
98+
- **corrected_workflow**: A complete, corrected version of the workflow (**REQUIRED if issues are non-empty, null if no issues**)
99+
- **validation_summary**: A brief summary of what was found and fixed
100+
101+
**IMPORTANT RULES:**
102+
1. If `issues` list is NOT empty → `corrected_workflow` MUST be provided with all fixes applied
103+
2. If `issues` list is empty → `corrected_workflow` should be null
104+
3. The `corrected_workflow` must be a complete WorkflowDefinitionSchema with all fields properly formatted
105+
106+
### Issue Severity Levels
107+
108+
- `critical`: Will cause workflow to fail, must be fixed
109+
- `warning`: May cause issues in some scenarios, should be fixed
110+
- `suggestion`: Nice to have improvement, optional
111+
112+
### Issue Types
113+
114+
Use these standardized issue types:
115+
- `agent_step`: Agent step that should be semantic
116+
- `missing_selector`: No target_text/xpath/cssSelector provided
117+
- `incorrect_step_type`: Wrong step type for the action
118+
- `invalid_variable`: Variable reference doesn't exist in input_schema
119+
- `missing_required_field`: Required field is missing
120+
- `generic_target_text`: Target text is too generic
121+
- `missing_error_handling`: No conditional logic for errors
122+
- `hardcoded_value`: Value that should be a variable
123+
- `missing_wait`: May need wait/delay step
124+
- `optimization`: Could be more efficient
125+
- `unclear_description`: Description needs improvement
126+
- `vague_extraction`: Extraction goal too vague
127+
128+
## Examples
129+
130+
### Example 1: Agent Step → Semantic Step
131+
132+
**Issue:**
133+
```json
134+
{
135+
"type": "agent",
136+
"task": "click the search button"
137+
}
138+
```
139+
140+
**Correction:**
141+
```json
142+
{
143+
"type": "click",
144+
"target_text": "Search",
145+
"description": "Click the search button"
146+
}
147+
```
148+
149+
### Example 2: Missing Target Text
150+
151+
**Issue:**
152+
```json
153+
{
154+
"type": "click",
155+
"description": "Click submit"
156+
}
157+
```
158+
159+
**Correction:**
160+
```json
161+
{
162+
"type": "click",
163+
"target_text": "Submit",
164+
"description": "Click submit button"
165+
}
166+
```
167+
168+
### Example 3: Hard-coded Value → Variable
169+
170+
**Issue:**
171+
```json
172+
{
173+
"type": "input",
174+
"target_text": "First Name",
175+
"value": "John"
176+
}
177+
```
178+
179+
**Correction:**
180+
```json
181+
{
182+
"type": "input",
183+
"target_text": "First Name",
184+
"value": "{{first_name}}"
185+
}
186+
```
187+
And add to input_schema:
188+
```json
189+
{
190+
"name": "first_name",
191+
"type": "string",
192+
"required": true,
193+
"description": "The first name to search for"
194+
}
195+
```
196+
197+
## Important Notes
198+
199+
- **Be thorough but practical** - focus on issues that will actually cause problems
200+
- **Preserve working parts** - don't break what's already correct
201+
- **Consider the context** - use the original task description to understand intent
202+
- **Use browser logs wisely** - if provided, they give direct evidence of what failed
203+
- **Default to semantic steps** - always prefer click/input/extract over agent steps
204+
- **Be specific in suggestions** - give concrete examples of how to fix issues
205+
- **ALWAYS provide corrected_workflow when issues exist** - this is mandatory, not optional!
206+
207+
## Validation Workflow
208+
209+
1. Review the workflow step by step
210+
2. Identify all issues (critical, warnings, suggestions)
211+
3. If issues found:
212+
- Document each issue with severity, description, and suggestion
213+
- Create a COMPLETE corrected workflow with ALL issues fixed
214+
- Return both the issues list AND the corrected workflow
215+
4. If no issues found:
216+
- Return empty issues list
217+
- Return null for corrected_workflow
218+
- Provide positive validation summary
219+
220+
Now, review the workflow provided by the user and return your validation results.

workflows/workflow_use/healing/service.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from workflow_use.builder.service import BuilderService
1212
from workflow_use.healing.deterministic_converter import DeterministicWorkflowConverter
1313
from workflow_use.healing.selector_generator import SelectorGenerator
14+
from workflow_use.healing.validator import WorkflowValidator
1415
from workflow_use.healing.variable_extractor import VariableExtractor
1516
from workflow_use.healing.views import ParsedAgentStep, SimpleDomElement, SimpleResult
1617
from workflow_use.schema.views import SelectorWorkflowSteps, WorkflowDefinitionSchema
@@ -22,13 +23,17 @@ def __init__(
2223
llm: BaseChatModel,
2324
enable_variable_extraction: bool = True,
2425
use_deterministic_conversion: bool = False,
26+
enable_ai_validation: bool = False,
2527
):
2628
self.llm = llm
2729
self.enable_variable_extraction = enable_variable_extraction
2830
self.use_deterministic_conversion = use_deterministic_conversion
31+
self.enable_ai_validation = enable_ai_validation
2932
self.variable_extractor = VariableExtractor(llm=llm) if enable_variable_extraction else None
3033
self.deterministic_converter = DeterministicWorkflowConverter(llm=llm) if use_deterministic_conversion else None
3134
self.selector_generator = SelectorGenerator() # Initialize multi-strategy selector generator
35+
# Note: validator will be initialized with extraction_llm in generate_workflow_from_prompt
36+
self.validator = None
3237

3338
self.interacted_elements_hash_map: dict[str, DOMInteractedElement] = {}
3439

@@ -548,4 +553,31 @@ async def act(self, action, browser_session, *args, **kwargs):
548553
prompt, history, extract_variables=self.enable_variable_extraction
549554
)
550555

556+
# Apply AI validation and correction if enabled
557+
if self.enable_ai_validation:
558+
# Initialize validator with extraction_llm (same as used for page extraction)
559+
# This is more reliable than the main agent LLM
560+
if not self.validator:
561+
self.validator = WorkflowValidator(llm=extraction_llm)
562+
563+
print('\n🔍 Running AI validation on generated workflow...')
564+
try:
565+
validation_result = await self.validator.validate_workflow(workflow=workflow_definition, original_task=prompt)
566+
567+
# Print validation report
568+
self.validator.print_validation_report(validation_result)
569+
570+
# Apply corrections if found
571+
if validation_result.corrected_workflow:
572+
print('\n✨ Applying AI corrections to workflow...')
573+
workflow_definition = validation_result.corrected_workflow
574+
print('✅ Workflow has been corrected!')
575+
elif validation_result.issues:
576+
print('\n⚠️ Issues found but no corrections were applied')
577+
else:
578+
print('\n✅ Validation passed - no issues found!')
579+
except Exception as e:
580+
print(f'\n⚠️ Validation failed: {e}')
581+
print('Continuing with original workflow...')
582+
551583
return workflow_definition
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
"""
2+
Test script for WorkflowValidator
3+
4+
This script tests the AI validation system with a sample workflow that has known issues.
5+
6+
Usage:
7+
BROWSER_USE_API_KEY=your_key uv run python workflow_use/healing/tests/test_validator.py
8+
"""
9+
10+
import asyncio
11+
import os
12+
13+
from browser_use.llm import ChatBrowserUse
14+
15+
from workflow_use.healing.validator import WorkflowValidator
16+
from workflow_use.schema.views import (
17+
AgentTaskWorkflowStep,
18+
ExtractStep,
19+
InputStep,
20+
NavigationStep,
21+
WorkflowDefinitionSchema,
22+
)
23+
24+
25+
async def test_validator():
26+
"""Test the validator with a workflow that has issues."""
27+
28+
# Check for API key
29+
if not os.getenv('BROWSER_USE_API_KEY'):
30+
print('Error: BROWSER_USE_API_KEY environment variable not set')
31+
print('Usage: BROWSER_USE_API_KEY=your_key uv run python workflow_use/healing/tests/test_validator.py')
32+
return
33+
34+
# Create a sample workflow with known issues
35+
sample_workflow = WorkflowDefinitionSchema(
36+
name='Test Workflow with Issues',
37+
description='A test workflow with various issues to validate',
38+
version='1.0.0',
39+
steps=[
40+
# Issue 1: Agent step that should be semantic
41+
AgentTaskWorkflowStep(type='agent', task='click the search button'),
42+
# Issue 2: Navigation step (this is actually correct)
43+
NavigationStep(type='navigation', url='https://example.com'),
44+
# Issue 3: Input with hard-coded value that should be variable
45+
InputStep(type='input', target_text='First Name', value='John'),
46+
# Issue 4: Extract step with vague goal
47+
ExtractStep(type='extract', extractionGoal='get the data', output='result'),
48+
],
49+
input_schema=[],
50+
)
51+
52+
# Initialize validator
53+
print('Initializing validator...')
54+
llm = ChatBrowserUse(model='bu-latest')
55+
validator = WorkflowValidator(llm=llm)
56+
57+
# Run validation
58+
print('\nRunning validation on sample workflow...')
59+
result = await validator.validate_workflow(workflow=sample_workflow, original_task='Test task for validation')
60+
61+
# Print report
62+
validator.print_validation_report(result)
63+
64+
# Check if corrections were made
65+
if result.corrected_workflow:
66+
print('\n' + '=' * 80)
67+
print('CORRECTED WORKFLOW')
68+
print('=' * 80)
69+
print(result.corrected_workflow.model_dump_json(indent=2, exclude_none=True))
70+
71+
return result
72+
73+
74+
if __name__ == '__main__':
75+
asyncio.run(test_validator())

0 commit comments

Comments
 (0)