VoiceIQ Backend is a Node.js/Express REST API that powers AI-driven call center compliance analysis. It accepts MP3 audio recordings of Indian call center conversations (Hinglish / Tanglish), transcribes them using Groq Whisper, and runs multi-stage NLP analysis using Llama 3.3 70B to produce structured compliance reports.
Built for HCL GUVI Intern Hiring Hackathon 2026 - Track 3: Call Center Compliance.
- Description
- Tech Stack
- API Endpoint
- How It Works
- Project Structure
- Setup Instructions
- Deployed URL
- Database Schema
- SOP Validation Logic
- Payment & Rejection Classification
- Known Limitations
- License
| Layer | Technology |
|---|---|
| Runtime | Node.js (ESM) |
| Framework | Express.js |
| STT | Groq Whisper large-v3 |
| LLM | Groq Llama 3.3 70B Versatile |
| Database | Supabase (PostgreSQL + Storage) |
| Auth | Supabase JWT + x-api-key header |
| Deploy | Render |
Accepts a Base64-encoded MP3 audio file and returns a full compliance analysis.
Authentication: x-api-key header required.
Request:
curl -X POST https://voiceiq-backend-8f4q.onrender.com/api/call-analytics \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"language": "Tamil",
"audioFormat": "mp3",
"audioBase64": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU2LjM2LjEwMAAAAAAA..."
}'Request Body:
| Field | Type | Description |
|---|---|---|
language |
string | Tamil or Hindi |
audioFormat |
string | Always mp3 |
audioBase64 |
string | Base64-encoded MP3 audio |
Response:
{
"status": "success",
"language": "Tamil",
"transcript": "Agent: Vanakkam, ungaloda outstanding EMI amount 5000 iruku...",
"summary": "Agent discussed outstanding EMI of βΉ5000. Customer requested partial payment due to budget constraints.",
"sop_validation": {
"greeting": true,
"identification": false,
"problemStatement": true,
"solutionOffering": true,
"closing": true,
"complianceScore": 0.8,
"adherenceStatus": "NOT_FOLLOWED",
"explanation": "The agent did not verify customer identity. All other SOP stages were followed correctly."
},
"analytics": {
"paymentPreference": "PARTIAL_PAYMENT",
"rejectionReason": "BUDGET_CONSTRAINTS",
"sentiment": "Neutral"
},
"keywords": [
"outstanding EMI", "partial payment", "budget", "5000", "today",
"payment plan", "due amount", "installment", "settlement", "callback"
]
}POST /api/call-analytics (Base64 MP3 + language)
β
x-api-key authentication check
β
Base64 decoded β written to temp file
β
Groq Whisper large-v3 transcribes audio
β auto-detects Hinglish / Tanglish
β generates timestamped segments [M:SSβM:SS]
β
Timestamped transcript passed to Llama 3.3 70B
β SOP validation (greeting / identification / problemStatement / solutionOffering / closing)
β Payment preference classification (EMI / FULL_PAYMENT / PARTIAL_PAYMENT / DOWN_PAYMENT)
β Rejection reason extraction (HIGH_INTEREST / BUDGET_CONSTRAINTS / ALREADY_PAID / NOT_INTERESTED / NONE)
β Sentiment analysis (Positive / Neutral / Negative)
β Keyword extraction (top 10 domain-specific terms)
β
Enum validation + complianceScore recalculation from booleans
β
Structured JSON response returned
β
Temp file deleted
voiceiq-backend/
βββ src/
β βββ controllers/
β β βββ callController.js # Dashboard upload flow
β β βββ callAnalyticsController.js # Hackathon evaluation endpoint
β β βββ analyticsController.js # Dashboard analytics
β β βββ transcriptController.js # Transcript search
β β βββ agentController.js # Agent management
β β βββ sopController.js # SOP rules CRUD
β βββ routes/
β β βββ callAnalyticsRoute.js # POST /api/call-analytics (x-api-key auth)
β β βββ callRoutes.js
β β βββ analyticsRoutes.js
β β βββ transcriptRoutes.js
β β βββ agentRoutes.js
β β βββ sopRoutes.js
β βββ services/
β β βββ groqService.js # Whisper + Llama 3.3 70B analysis
β β βββ whisperService.js # Audio transcription
β β βββ supabaseService.js # DB + Storage client
β βββ middleware/
β β βββ authMiddleware.js # Supabase JWT verification
β β βββ uploadMiddleware.js # Multer file upload
β βββ utils/
β βββ responseHelper.js # success/error response helpers
βββ .env.example
βββ package.json
βββ server.js
git clone https://github.com/JexanJoel/VoiceIQ-backend.git
cd VoiceIQ-backendnpm installcp .env.example .envFill in your .env:
GROQ_API_KEY=your_groq_api_key
SUPABASE_URL=your_supabase_project_url
SUPABASE_SERVICE_KEY=your_supabase_service_role_key
JWT_SECRET=your_jwt_secret
HACKATHON_API_KEY=your_chosen_api_key
PORT=5000# Development
npm run dev
# Production
npm startServer starts at http://localhost:5000
https://voiceiq-backend-8f4q.onrender.com
Hackathon evaluation endpoint:
POST https://voiceiq-backend-8f4q.onrender.com/api/call-analytics
calls β recordings, transcripts, compliance results, sentiment, agent assignment
agents β agent profiles per user account
sop_rules β custom SOP rules per user (max 10, with categories)
All tables have Row Level Security (RLS) enabled.
The API evaluates 5 stages of a standard call center script:
| Stage | Description |
|---|---|
greeting |
Agent opened with Hello / Vanakkam / Namaste |
identification |
Agent verified customer name or account |
problemStatement |
Agent clearly stated call purpose/issue |
solutionOffering |
Agent offered a solution or product |
closing |
Agent ended with a closing statement |
complianceScore = number of true steps / 5 (recalculated from booleans, not LLM-generated)
adherenceStatus = FOLLOWED only if all 5 are true, otherwise NOT_FOLLOWED
Payment Preference:
| Value | Meaning |
|---|---|
EMI |
Customer wants installment payments |
FULL_PAYMENT |
Customer will pay full amount at once |
PARTIAL_PAYMENT |
Customer will pay part now, rest later |
DOWN_PAYMENT |
Customer will pay initial deposit |
NONE |
No payment discussed |
Rejection Reason:
| Value | Meaning |
|---|---|
HIGH_INTEREST |
Customer complained about interest/rate |
BUDGET_CONSTRAINTS |
Customer cited lack of funds |
ALREADY_PAID |
Customer claims prior payment |
NOT_INTERESTED |
Customer declined product/service |
NONE |
No rejection - payment agreed or N/A |
- Audio files above ~25MB may hit Whisper API limits
- Very noisy or low-quality recordings reduce transcription accuracy
- Rejection reason detection works best when customer explicitly states their reason
- Whisper language detection is automatic -
languagefield in request is used for LLM context, not to force STT language