Sales Call Transcript Extractor

Transform sales call transcripts into actionable CSV data with AI-powered sentiment analysis, business insights, and deal prediction.

Built with LaunchDarkly AI Configs and Vercel AI Gateway for dynamic schema management. Uses the official @launchdarkly/server-sdk-ai-vercel SDK with intelligent tool selection. Includes ML model training for deal prediction using CatBoost.

What You Get

📊 40-65 structured data fields per call including sentiment scores, deal signals, engagement metrics, and call-specific insights - all exportable as CSV for your CRM or analytics tools.

Quick Start

Manual Setup

# 1. Install dependencies
npm install
python3 -m venv venv && source venv/bin/activate
pip install catboost scikit-learn pandas numpy joblib requests python-dotenv

# 2. Configure environment variables
cp .env.example .env
# Add your keys to .env:
#   LAUNCHDARKLY_SDK_KEY=sdk-xxxxx
#   LD_API_KEY=api-xxxxx (for bootstrap only)
#   LD_PROJECT_KEY=your-project (for bootstrap only)

# 3. Get Vercel AI Gateway token
npx vercel env pull

# 4. Bootstrap LaunchDarkly AI Config (one-time)
python bootstrap/create_unified_config.py

# 5. Run
npm run dev  # → http://localhost:3000

Docker Compose

# 1. Configure environment
cp .env.example .env  # Edit with your keys
npx vercel env pull   # Get OIDC token

# 2. Start services
docker-compose up

# 3. Setup LaunchDarkly (one-time, in another terminal)
docker-compose exec python sh -c ". /app/venv/bin/activate && python bootstrap/create_unified_config.py"

# → http://localhost:3000

How It Works

Unified AI Extraction

Upload Transcripts (.txt or .md)
    ↓
AI analyzes content and selects appropriate schema
    ↓
Extracts 40-65 fields in single LLM call
    ↓
CSV downloads automatically

6 Extraction Schemas

The AI automatically selects the best schema based on transcript content:

Schema	Type	Fields	Use Case
A	Prospecting	~43	Cold outreach, first contact, gatekeeper conversations
B	Discovery	~48	BANT qualification, needs assessment, pain point identification
C	Demo	~58	Product demonstrations, feature walkthroughs, POC discussions
D	Proposal	~53	Pricing negotiations, contract terms, commercial discussions
E	Technical	~63	Architecture reviews, integration planning, technical deep-dives
F	Customer Success	~53	QBRs, renewal discussions, expansion opportunities

Core Fields (in all schemas):

Identity: transcript_id, customer_company_name, salesperson_name, call_date
Business: deal_stage, customer_size, industry, estimated_deal_value
Sentiment: overall_sentiment_score, sentiment_about_product, sentiment_about_pricing (-1 to +1)
Engagement: customer_engagement_score, urgency_score, budget_confidence_score (0 to 1)
Statistics: transcript_word_count, customer_question_count, competitor_mention_count
Signals: next_steps_defined, competitors_mentioned, decision_makers_present

Configuration

1. Bootstrap LaunchDarkly

export LD_API_KEY="api-xxxxx"
export LD_PROJECT_KEY="your-project"
python bootstrap/create_unified_config.py

What it creates:

✅ 1 AI Config: transcript-extraction-unified
✅ 6 extraction tools (schemas A-F)
✅ 1 unified variation with all tools attached
✅ Default model: claude-3-7-sonnet-latest

Important: The script deletes all existing AI configs and tools to ensure a clean state.

2. Environment Variables

# LaunchDarkly
LAUNCHDARKLY_SDK_KEY=sdk-xxxxx       # Runtime SDK key
LD_API_KEY=api-xxxxx                 # For bootstrap script only
LD_PROJECT_KEY=your-project          # For bootstrap script only

# Vercel AI Gateway (choose one)
VERCEL_OIDC_TOKEN=eyJhbGc...         # Preferred - auto-refreshed
AI_GATEWAY_API_KEY=vck_xxxxx         # Alternative - manual

Get OIDC token:

npx vercel env pull  # Refresh every 12 hours

Usage

Feature Extraction

Start dev server: npm run dev
Upload .txt or .md transcripts
Click "Extract Features"
Watch real-time progress
CSV downloads automatically

Features:

Real-time progress tracking with SSE
Per-file error reporting (continues on failures)
Empty file detection
UTF-8 and special character support
Large file handling

ML Model Training

Click "Train Model" in the ML section
Choose demo data or use extracted CSV
View metrics: Precision, Recall, F1, RMSE, R²
Model saved as deal_model.pkl

Deployment

Local Development

npm run dev  # → http://localhost:3000

Production (Vercel)

npx vercel --prod

Vercel Setup:

Add LAUNCHDARKLY_SDK_KEY in project settings
VERCEL_OIDC_TOKEN auto-provided in production
Build configured with .npmrc and next.config.js

Production URL: https://vercel-tan-beta-71.vercel.app

Dynamic Configuration

All schemas live in LaunchDarkly - update without code changes!

Modify Schemas

LaunchDarkly → AI Tools → Select tool
Edit JSON Schema parameters
Save → Changes apply immediately

A/B Testing

Create new variation with modified schema
Set targeting rules (e.g., customer_segment = "Enterprise")
Monitor metrics in LaunchDarkly dashboard

Benefits:

No code deployment to change fields
Gradual rollout and canary testing
Performance tracking per variation
Instant rollback capability

Customization

Add New Schema

Define schema in LAUNCHDARKLY_TOOLS.json:

{
  "variation_g_custom": {
    "function": {
      "name": "extract_custom_features",
      "description": "Extract features from custom call type",
      "parameters": {
        "type": "object",
        "properties": {
          "custom_field_1": { "type": "string" },
          "custom_field_2": { "type": "number" }
        },
        "required": ["custom_field_1"]
      }
    }
  }
}

Update bootstrap in bootstrap/create_unified_config.py:

tool_mapping = {
    # ... existing tools ...
    "extract_custom_features": ("Extract Custom Features (G)", "variation_g_custom"),
}

Run bootstrap:

python bootstrap/create_unified_config.py

Adjust Model Selection

Change model in bootstrap/create_unified_config.py:

model_config_key="Anthropic.claude-3-7-sonnet-latest"  # Premium, accurate
# or
model_config_key="Gemini.gemini-2.0-flash"  # Fast, cheaper

Project Structure

├── app/                              # Next.js app
│   ├── page.tsx                      # Upload UI
│   └── api/
│       ├── extract-stream/           # SSE endpoint
│       ├── extract/                  # Non-streaming endpoint
│       └── train-model/              # ML training endpoint
├── lib/
│   ├── pipeline.ts                   # Unified extraction pipeline
│   └── launchdarkly-client.ts        # SDK integration with tool calling
├── bootstrap/
│   ├── create_unified_config.py      # Setup unified AI config
│   └── create_configs.py             # Legacy (do not use)
├── ml/
│   └── train_and_return_metrics.py   # CatBoost model training
├── LAUNCHDARKLY_TOOLS.json           # Schema definitions
├── docker-compose.yml
└── .env

Troubleshooting

"OIDC token has expired" (401 errors)

npx vercel env pull  # Refreshes token
# Restart dev server

"Unified extraction config not found"

python bootstrap/create_unified_config.py
# Check LAUNCHDARKLY_SDK_KEY is set

"No transcripts were successfully processed"

Check OIDC token is fresh
Verify LaunchDarkly config exists in dashboard
Check console for specific errors

LaunchDarkly connection issues

# Verify SDK key
echo $LAUNCHDARKLY_SDK_KEY

# Test initialization
npm run dev
# Look for "LaunchDarkly client initialized successfully"

Build errors

Already configured in next.config.js
Uses webpack externals for LaunchDarkly SDK
Check .npmrc has legacy-peer-deps=true

API Reference

POST `/api/extract-stream`

Streams extraction progress via Server-Sent Events.

Request:

{
  "transcripts": [
    { "name": "call1.txt", "content": "..." },
    { "name": "call2.txt", "content": "..." }
  ]
}

Response (SSE):

event: progress
data: {"type":"extraction","current":1,"total":2,"percentage":50}

event: complete
data: {"type":"done","csvData":"...","results":[...]}

POST `/api/extract`

Non-streaming extraction (for testing).

Returns: { success: true, csvData: "...", results: [...] }

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
app		app
bootstrap		bootstrap
content		content
data/test-transcripts		data/test-transcripts
examples		examples
lib		lib
ml		ml
.env.example		.env.example
.env.vercel		.env.vercel
.gitignore		.gitignore
.npmrc		.npmrc
Dockerfile.web		Dockerfile.web
LAUNCHDARKLY_TOOLS.json		LAUNCHDARKLY_TOOLS.json
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-start.sh		docker-start.sh
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
requirements.txt		requirements.txt
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sales Call Transcript Extractor

What You Get

Quick Start

Manual Setup

Docker Compose

How It Works

Unified AI Extraction

6 Extraction Schemas

Configuration

1. Bootstrap LaunchDarkly

2. Environment Variables

Usage

Feature Extraction

ML Model Training

Deployment

Local Development

Production (Vercel)

Dynamic Configuration

Modify Schemas

A/B Testing

Customization

Add New Schema

Adjust Model Selection

Project Structure

Troubleshooting

API Reference

POST `/api/extract-stream`

POST `/api/extract`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sales Call Transcript Extractor

What You Get

Quick Start

Manual Setup

Docker Compose

How It Works

Unified AI Extraction

6 Extraction Schemas

Configuration

1. Bootstrap LaunchDarkly

2. Environment Variables

Usage

Feature Extraction

ML Model Training

Deployment

Local Development

Production (Vercel)

Dynamic Configuration

Modify Schemas

A/B Testing

Customization

Add New Schema

Adjust Model Selection

Project Structure

Troubleshooting

API Reference

POST /api/extract-stream

POST /api/extract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

POST `/api/extract-stream`

POST `/api/extract`

Packages