Skip to content

Commit d4c66b7

Browse files
committed
first version of cartesia line form fill integration
1 parent a71521d commit d4c66b7

File tree

12 files changed

+1142
-0
lines changed

12 files changed

+1142
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
agent-id = 'agent_NKsQKSxugbsoA3ByZrJVQY'
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Gemini API Key for language model
2+
GEMINI_API_KEY=your_gemini_api_key_here
3+
4+
# Optional: Browserbase API credentials for cloud browser automation
5+
# If not set, will use local browser
6+
BROWSERBASE_API_KEY=your_browserbase_api_key_here
7+
BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here
8+
9+
# Optional: Model configuration
10+
MODEL_NAME=google/gemini-2.0-flash-exp
11+
MODEL_API_KEY=your_model_api_key_here
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*.pyd
5+
.Python
6+
7+
# Virtual environments
8+
.env
9+
.venv/
10+
venv/
11+
env/
12+
13+
virtualenv/
14+
15+
# Conda environments
16+
conda-env/
17+
envs/
18+
.conda/
19+
conda-meta/
20+
21+
# uv environments (in addition to uv.lock at top)
22+
uv.lock
23+
.python-version
24+
25+
# Python package managers
26+
poetry.lock
27+
Pipfile.lock
28+
pip-log.txt
29+
30+
# pyenv
31+
.pyenv/
32+
33+
# Distribution / packaging
34+
*.egg-info/
35+
dist/
36+
build/
37+
38+
# Editor / OS files
39+
.DS_Store
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Voice Agent with Real-time Web Form Filling
2+
3+
This project demonstrates an advanced voice agent that conducts phone questionnaires while automatically filling out web forms in real-time using Stagehand browser automation.
4+
5+
## Features
6+
7+
- **Voice Conversations**: Natural voice interactions using Cartesia Line
8+
- **Real-time Form Filling**: Automatically fills web forms as answers are collected
9+
- **Browser Automation**: Uses Stagehand AI to interact with any web form
10+
- **Intelligent Mapping**: AI-powered mapping of voice answers to form fields
11+
- **Async Processing**: Non-blocking form filling maintains conversation flow
12+
- **Auto-submission**: Submits forms automatically when complete
13+
14+
## Architecture
15+
16+
```
17+
Voice Call (Cartesia) → Form Filling Node → Records Answer
18+
19+
Stagehand Browser API
20+
21+
Fills Web Form Field
22+
23+
Continues Conversation
24+
25+
Submits Form on Completion
26+
```
27+
28+
## Setup
29+
30+
1. Install dependencies:
31+
```bash
32+
pip install -r requirements.txt
33+
```
34+
35+
2. Set up environment variables:
36+
```bash
37+
cp .env.example .env
38+
# Add your GEMINI_API_KEY
39+
```
40+
41+
3. Run the agent:
42+
```bash
43+
python main.py
44+
```
45+
46+
## Components
47+
48+
### StagehandFormFiller
49+
- Manages browser automation
50+
- Opens and controls web forms
51+
- Maps conversation data to form fields
52+
- Handles form submission
53+
54+
### FormFillingNode
55+
- Voice-optimized reasoning node
56+
- Integrates Stagehand browser automation
57+
- Manages async form filling during conversation
58+
- Provides status updates
59+
60+
### FormFieldMapping
61+
- Maps YAML questions to web form fields
62+
- Transforms voice answers to form-compatible formats
63+
- Handles different field types (text, select, checkbox, etc.)
64+
65+
## Configuration
66+
67+
The system can be configured through:
68+
69+
- `form.yaml`: Define questionnaire structure
70+
- `FORM_URL`: Target web form to fill
71+
- `headless`: Run browser in background (True) or visible (False)
72+
- `enable_browser`: Toggle browser automation on/off
73+
74+
## Example Flow
75+
76+
1. User calls the voice agent
77+
2. Agent asks: "What type of voice agent are you building?"
78+
3. User responds: "A customer service agent"
79+
4. System:
80+
- Records the answer
81+
- Opens browser to form (if not already open)
82+
- Fills "Customer Service" in the role selection field
83+
- Takes screenshot for debugging
84+
5. Agent asks next question
85+
6. Process continues until all questions answered
86+
7. Form is automatically submitted
87+
88+
## Advanced Features
89+
90+
- **Background Processing**: Form filling happens asynchronously
91+
- **Error Recovery**: Continues conversation even if form filling fails
92+
- **Progress Tracking**: Monitor form completion status
93+
- **Screenshot Debugging**: Captures screenshots after each field
94+
- **Flexible Mapping**: AI interprets answers for different field types
95+
96+
## Testing
97+
98+
Test with different scenarios:
99+
- Complete questionnaire flow
100+
- Interruptions and corrections
101+
- Various answer formats
102+
- Multi-page forms
103+
- Form validation errors
104+
105+
## Production Considerations
106+
107+
- Set `headless=True` for production
108+
- Configure proper error logging
109+
- Add retry logic for form submission
110+
- Implement form validation checks
111+
- Consider rate limiting for API calls

examples/integrations/cartesia/__init__.py

Whitespace-only changes.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[app]
2+
name = "form-filling"
3+
4+
[build]
5+
cmd = "echo 'No build cmd specified'"
6+
7+
[run]
8+
cmd = "echo 'No run cmd specified'"
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import os
2+
3+
DEFAULT_MODEL_ID = os.getenv("MODEL_ID", "gemini-2.5-flash")
4+
5+
DEFAULT_TEMPERATURE = 0.7
6+
SYSTEM_PROMPT = """
7+
### You and your role
8+
You are a friendly assistant conducting a questionnaire.
9+
Be professional but conversational. Confirm answers when appropriate.
10+
If a user's answer is unclear, ask for clarification.
11+
For sensitive information, be especially tactful and professional.
12+
13+
IMPORTANT: When you receive a clear answer from the user, use the record_answer tool to record their response.
14+
15+
### Your tone
16+
When having a conversation, you should:
17+
- Always polite and respectful, even when users are challenging
18+
- Concise and brief but never curt. Keep your responses to 1-2 sentences and less than 35 words
19+
- When asking a question, be sure to ask in a short and concise manner
20+
- Only ask one question at a time
21+
22+
If the user is rude, or curses, respond with exceptional politeness and genuine curiosity.
23+
You should always be polite.
24+
25+
Remember, you're on the phone, so do not use emojis or abbreviations. Spell out units and dates.
26+
"""
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
agent-id = 'agent_NKsQKSxugbsoA3ByZrJVQY'

0 commit comments

Comments
 (0)