A complete AI-powered outbound calling system that can make phone calls, interact naturally with HR representatives, and store conversation data (transcript + summary) in a database.
- Real-Time Voice Interaction: Speech-to-Text (Deepgram) and Text-to-Speech (ElevenLabs)
- AI Conversation Logic: Dynamic conversations powered by OpenAI GPT
- Calling Functionality: Outbound calls via Twilio Programmable Voice API with WebSocket audio streaming
- Data Management: SQLite database for storing transcripts, summaries, and extracted answers
- Dashboard: React frontend for viewing calls and transcripts
- Node.js (v16 or higher)
- npm or yarn
- Twilio account with Programmable Voice enabled
- OpenAI API key
- Deepgram API key (for STT)
- ElevenLabs API key (for TTS)
-
Install dependencies (root + client):
npm run install-all
-
Configure environment variables:
cp .env.example .env
Edit
.envand add your API keys:TWILIO_ACCOUNT_SIDandTWILIO_AUTH_TOKENTWILIO_PHONE_NUMBER(your Twilio phone number)OPENAI_API_KEYDEEPGRAM_API_KEYELEVENLABS_API_KEY
-
Start the development server:
npm run dev
This will start:
- Backend server on
http://localhost:3001 - React frontend on
http://localhost:3000
- Backend server on
.
├── server/
│ ├── index.js # Express server entry point
│ ├── database/
│ │ └── db.js # SQLite database setup
│ ├── models/
│ │ └── Call.js # Call data model
│ ├── services/
│ │ ├── aiService.js # OpenAI conversation logic
│ │ ├── sttService.js # Deepgram STT integration
│ │ ├── ttsService.js # ElevenLabs TTS integration
│ │ ├── callService.js # Twilio calling logic
│ │ └── websocketService.js # WebSocket audio streaming
│ └── routes/
│ └── calls.js # API routes for calls
├── client/ # React frontend
└── data/ # SQLite database (auto-created)
POST /api/calls/initiate
Body: { "phone_number": "+1234567890" }GET /api/calls?limit=50&offset=0GET /api/calls/:idDELETE /api/calls/:id-
Via API (curl example):
curl -X POST http://localhost:3001/api/calls/initiate \ -H "Content-Type: application/json" \ -d '{"phone_number": "+1234567890"}'
-
Via Dashboard:
- Open
http://localhost:3000 - Enter phone number and click "Initiate Call"
- Open
The AI agent follows this conversation flow:
- Greeting: Warm, professional greeting
- Introduction: Brief introduction and purpose
- Job Inquiry: Asks about hiring status for fresh graduates
- Data Collection: Collects responses naturally
- Closing: Polite closing
CREATE TABLE calls (
id INTEGER PRIMARY KEY AUTOINCREMENT,
phone_number TEXT NOT NULL,
transcript TEXT,
summary TEXT,
extracted_answers TEXT,
status TEXT DEFAULT 'initiated',
duration INTEGER,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
)- Sign up at Twilio
- Get a phone number with Voice capabilities
- Add credentials to
.env - In the Twilio Console, configure a Voice webhook (TwiML App or number) to
https://<YOUR_DOMAIN>/api/calls/twiml/{CALL_ID}. During local development, expose your server with ngrok and updateBASE_URL. - Ensure the caller phone number is verified for trial accounts.
- Get API key from OpenAI
- Add to
.envasOPENAI_API_KEY
- Sign up at Deepgram
- Get API key and add to
.env - (Optional) Adjust STT model options in
server/services/sttService.js
- Sign up at ElevenLabs
- Get API key and add to
.env - Optionally configure
ELEVENLABS_VOICE_IDfor different voices
The React dashboard provides:
- List of all calls with status
- View call transcripts
- View AI-generated summaries
- View extracted answers
- Initiate new calls
- Streaming Flow: Twilio streams audio to
/api/calls/stream/{CALL_ID}. Each chunk is transcribed, routed through GPT, and returned as synthesized speech. - Latency Considerations: This prototype transcribes chunked audio (
transcribeBuffer). For lower latency and interim transcripts, upgrade to full-duplex streaming (transcribeStream) and Twilio's bidirectional streams. - Public Reachability: Twilio must access your server. Deploy to a public host or use ngrok and update
BASE_URL. - Audio Encoding: Twilio expects μ-law 8kHz audio when streaming responses. The current implementation includes a placeholder—add audio transcoding (e.g., with FFmpeg/prism-media) before enabling live backchannel audio.
- Usage Costs: Monitor usage across Twilio, OpenAI, Deepgram, and ElevenLabs—each charges per use.
- Compliance: Confirm calling, recording, and data retention regulations in the regions you operate.
- WebSocket server for real-time audio streaming
- Support for multiple languages
- Call recording and playback
- Advanced analytics dashboard
- Integration with CRM systems
- Scheduled calling
- Call quality metrics
MIT