Skip to content

Commit 6aeb23a

Browse files
committed
feat: audio recording foundation - types, deps, IPC setup
- Add Recording types and IPC API definitions - Install Vercel AI SDK dependencies - Add Recordings tab to UI - Enable media capture permissions - Stub IPC handlers (implementation in next PR)
1 parent c966904 commit 6aeb23a

File tree

11 files changed

+801
-19
lines changed

11 files changed

+801
-19
lines changed

docs/prd-audio-transcription.md

Lines changed: 397 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,397 @@
1+
# PRD: Audio Transcription & Task Extraction
2+
3+
## Overview
4+
5+
Enable users to record audio (calls, meetings, voice notes), automatically transcribe using AI, and extract actionable tasks that can be created directly in PostHog.
6+
7+
## Problem
8+
9+
Users have valuable conversations (calls, meetings, brainstorms) where tasks and action items are discussed, but:
10+
- Manual note-taking is incomplete and distracting
11+
- Context is lost between conversation and task creation
12+
- Transcribing and extracting tasks manually is time-consuming
13+
- No seamless way to turn conversation → actionable tasks
14+
15+
## Solution
16+
17+
Browser-based audio recording with AI-powered transcription and intelligent task extraction:
18+
1. **Record** audio directly in the app
19+
2. **Transcribe** using OpenAI Whisper via Vercel AI SDK
20+
3. **Generate** concise summary of the conversation
21+
4. **Extract** actionable tasks automatically using GPT
22+
5. **Create** tasks in PostHog with one click
23+
24+
## V1 Requirements (Ship This Week)
25+
26+
### Functional Requirements
27+
28+
#### 1. Audio Recording
29+
- ✅ Record audio from system/microphone via browser MediaRecorder API
30+
- ✅ Real-time duration counter during recording
31+
- ✅ Start/stop recording controls
32+
- ✅ Save recordings to local storage (`userData/recordings/`)
33+
- ✅ Audio format: WebM (cross-platform, compressed)
34+
35+
#### 2. Recording Management
36+
- ✅ List all recordings with metadata (date, duration)
37+
- ✅ Play back recordings in-app
38+
- ✅ Delete recordings
39+
- ✅ Persist metadata (duration, created_at, transcription status)
40+
41+
#### 3. Transcription
42+
- ✅ Transcribe recording using OpenAI Whisper (via Vercel AI SDK)
43+
- ✅ Show transcription status (processing/completed/error)
44+
- ✅ Display full transcript text
45+
- ✅ Generate 3-7 word summary title
46+
- ✅ Handle transcription errors gracefully
47+
48+
#### 4. Task Extraction
49+
- ✅ Automatically extract actionable tasks from transcript
50+
- ✅ Use GPT-4o-mini for task identification
51+
- ✅ Extract: feature requests, bug reports, requirements, action items
52+
- ✅ Format: `{title: string, description: string}` with context
53+
54+
#### 5. Task Creation
55+
- ✅ Bulk create tasks in PostHog from extracted tasks
56+
- ✅ One-click "Create All Tasks" button
57+
- ✅ Link transcript context to task descriptions
58+
- ✅ Success/error feedback
59+
60+
### Non-Functional Requirements
61+
62+
- **Performance**: Transcription completes within 2x audio duration
63+
- **Storage**: Audio files compressed to ~1MB per 10 minutes
64+
- **Reliability**: Handle network failures, large files (25MB Whisper limit)
65+
- **Privacy**: All recordings stored locally, API keys encrypted
66+
- **UX**: Clear status indicators, error messages, loading states
67+
68+
### User Stories
69+
70+
**As a user, I want to:**
71+
1. Record a call/meeting without leaving the app
72+
2. Get an automatic transcript without manual typing
73+
3. See a summary of what was discussed
74+
4. Get a list of action items extracted automatically
75+
5. Create PostHog tasks from those action items with one click
76+
6. Access my recording history and transcripts
77+
78+
### UI/UX
79+
80+
**Recordings Tab (new)**
81+
- Main view accessible from tab bar
82+
- "Record" button (prominent, red when recording)
83+
- Real-time duration counter during recording
84+
- List of recordings (newest first)
85+
- Each recording card shows:
86+
- Date/time
87+
- Duration
88+
- Play button
89+
- Transcribe button (if not transcribed)
90+
- Delete button
91+
- Transcript (if available)
92+
- Summary (if available)
93+
- Extracted tasks (if available)
94+
- "Create Tasks" button (if tasks extracted)
95+
96+
**Design System**
97+
- Use Radix UI components (Button, Card, Flex, Text, etc.)
98+
- Follow existing app styling patterns
99+
- Responsive layout
100+
- Keyboard shortcuts (future)
101+
102+
## Technical Architecture
103+
104+
### Core Technologies
105+
106+
**Vercel AI SDK**
107+
- **Why**: Provider-agnostic abstraction, makes swapping providers trivial
108+
- **Usage**: OpenAI provider for Whisper transcription and GPT task extraction
109+
- **Future**: Swap to Anthropic, AssemblyAI, or local models by changing config
110+
111+
**Browser MediaRecorder API**
112+
- **Why**: No native dependencies, cross-platform, works today
113+
- **Format**: WebM with Opus audio codec
114+
- **Fallback**: Error handling for unsupported browsers
115+
116+
**React Query**
117+
- **Why**: Handles data fetching, caching, optimistic updates
118+
- **Usage**: Recording list, transcription status polling
119+
- **Benefit**: No new Zustand store needed
120+
121+
### Architecture Diagram
122+
123+
```
124+
┌─────────────────────────────────────────────────┐
125+
│ Renderer Process │
126+
│ │
127+
│ RecordingsView.tsx │
128+
│ ↓ │
129+
│ useRecordings.ts (React Query) │
130+
│ ↓ │
131+
│ window.electronAPI.recording*() │
132+
└──────────────────┬──────────────────────────────┘
133+
│ IPC
134+
┌──────────────────┴──────────────────────────────┐
135+
│ Main Process │
136+
│ │
137+
│ recording.ts (Service) │
138+
│ ├─ MediaRecorder handling │
139+
│ ├─ File system storage │
140+
│ ├─ Vercel AI SDK │
141+
│ │ ├─ OpenAI Whisper (transcription) │
142+
│ │ └─ GPT-4o-mini (summary + extraction) │
143+
│ └─ IPC handlers │
144+
│ │
145+
│ transcription-prompts.ts │
146+
│ ├─ SUMMARY_PROMPT │
147+
│ └─ TASK_EXTRACTION_PROMPT │
148+
└───────────────────────────────────────────────────┘
149+
```
150+
151+
### File Structure
152+
153+
**New Files:**
154+
```
155+
src/
156+
├── main/
157+
│ └── services/
158+
│ ├── recording.ts # Core recording service
159+
│ └── transcription-prompts.ts # Prompt configuration
160+
└── renderer/
161+
├── components/
162+
│ └── RecordingsView.tsx # Main UI
163+
└── hooks/
164+
└── useRecordings.ts # Recording state hook
165+
```
166+
167+
**Modified Files:**
168+
```
169+
src/
170+
├── main/
171+
│ ├── index.ts # Register IPC
172+
│ └── preload.ts # Expose API
173+
├── renderer/
174+
│ └── components/
175+
│ └── MainLayout.tsx # Add tab
176+
└── shared/
177+
└── types.ts # Recording types
178+
```
179+
180+
### Data Models
181+
182+
```typescript
183+
interface Recording {
184+
id: string // Filename
185+
filename: string
186+
duration: number // Seconds
187+
created_at: string // ISO 8601
188+
file_path: string // Absolute path
189+
transcription?: {
190+
status: 'processing' | 'completed' | 'error'
191+
text: string
192+
summary?: string
193+
extracted_tasks?: Array<{
194+
title: string
195+
description: string
196+
}>
197+
error?: string
198+
}
199+
}
200+
```
201+
202+
### IPC API
203+
204+
```typescript
205+
// Recording lifecycle
206+
window.electronAPI.recordingStart()
207+
Promise<{recordingId: string, startTime: string}>
208+
209+
window.electronAPI.recordingStop(recordingId, audioData, duration)
210+
Promise<Recording>
211+
212+
// Recording management
213+
window.electronAPI.recordingList()
214+
Promise<Recording[]>
215+
216+
window.electronAPI.recordingDelete(recordingId)
217+
Promise<boolean>
218+
219+
window.electronAPI.recordingGetFile(recordingId)
220+
Promise<ArrayBuffer>
221+
222+
// Transcription
223+
window.electronAPI.recordingTranscribe(recordingId, openaiApiKey)
224+
Promise<TranscriptionResult>
225+
```
226+
227+
### Key Technical Decisions
228+
229+
| Decision | Rationale |
230+
|----------|-----------|
231+
| **Vercel AI SDK** | Provider-agnostic, production-ready, easy provider swaps |
232+
| **Browser Recording** | No native deps, cross-platform, works today |
233+
| **Prompts in Code** | Ship fast, easy to move to editable file later |
234+
| **React Query** | Standard data fetching, no new state layer |
235+
| **Local Storage** | Privacy-first, no upload costs, instant access |
236+
| **WebM Format** | Cross-platform, compressed, MediaRecorder default |
237+
238+
## V1 Out of Scope
239+
240+
❌ Meeting auto-detection (Zoom/Meet)
241+
❌ Multiple transcription provider UI
242+
❌ User-editable prompts UI
243+
❌ Real-time streaming transcription
244+
❌ Recording-only mode (no PostHog)
245+
❌ Cloud storage/sync
246+
❌ Speaker diarization
247+
❌ Custom vocabulary/terminology
248+
249+
## Future Improvements (Priority Order)
250+
251+
### P0: Based on User Feedback
252+
1. **Meeting Auto-Detection**
253+
- Detect active Zoom/Google Meet sessions
254+
- Show notification to start recording
255+
- Auto-name recordings by meeting title
256+
- *Build when*: Users manually start recordings often
257+
258+
2. **User-Editable Prompts**
259+
- Move prompts to `~/.array/prompts.json`
260+
- UI to edit summary and extraction prompts
261+
- Prompt templates (standup, brainstorm, bug triage)
262+
- *Build when*: Users want custom task extraction logic
263+
264+
### P1: Enhanced Functionality
265+
3. **Provider Selection**
266+
- UI to choose transcription provider
267+
- Support: OpenAI, Anthropic, AssemblyAI, local Whisper
268+
- Provider-specific settings (language, model)
269+
- *Build when*: Users request specific providers or cost concerns
270+
271+
4. **Real-Time Streaming**
272+
- Live transcription while recording
273+
- Show transcript building in real-time
274+
- Early task extraction
275+
- *Build when*: Users record long sessions (>30 min)
276+
277+
5. **Recording-Only Mode**
278+
- Use Array without PostHog (pure recorder)
279+
- Export tasks to Markdown/CSV
280+
- Standalone app mode
281+
- *Build when*: Users want tool without PostHog
282+
283+
### P2: Power User Features
284+
6. **Advanced Editing**
285+
- Edit transcripts before task creation
286+
- Manually add/remove extracted tasks
287+
- Timestamp navigation
288+
- *Build when*: Transcription accuracy issues reported
289+
290+
7. **Organization & Search**
291+
- Tag recordings
292+
- Search transcripts
293+
- Folders/projects
294+
- Export transcripts
295+
- *Build when*: Users have 50+ recordings
296+
297+
8. **Team Features**
298+
- Share recordings with team
299+
- Shared prompt templates
300+
- Team transcription usage/costs
301+
- *Build when*: Teams using Array together
302+
303+
9. **Integration Enhancements**
304+
- Auto-tag tasks by meeting type
305+
- Link to calendar events
306+
- Attach to GitHub issues
307+
- Slack notifications
308+
- *Build when*: Integration requests from users
309+
310+
## Success Metrics
311+
312+
### V1 Goals (First Month)
313+
- **Adoption**: 30% of active users record at least 1 audio
314+
- **Usage**: Average 3 recordings per user per week
315+
- **Conversion**: 70% of transcriptions → tasks created
316+
- **NPS**: +40 from users who transcribe recordings
317+
318+
### Key Metrics to Track
319+
- Recordings created per user
320+
- Transcription completion rate
321+
- Tasks created from transcriptions
322+
- Time saved (estimated)
323+
- Feature request themes
324+
- Error rates (transcription failures, API issues)
325+
326+
### User Feedback Questions
327+
1. How often do you record calls/meetings?
328+
2. Is the transcription accurate enough?
329+
3. Are extracted tasks relevant and actionable?
330+
4. What other providers would you want?
331+
5. What features are missing?
332+
333+
## Open Questions
334+
335+
### Pre-Launch
336+
- [ ] Should we require OpenAI API key upfront or use Array's key with usage limits?
337+
- [ ] How do we handle 25MB Whisper file size limit? (error message vs auto-chunking)
338+
- [ ] Should recordings auto-transcribe or wait for user action? (cost concern)
339+
340+
### Post-Launch (answer based on data)
341+
- [ ] Do users want meeting auto-detection?
342+
- [ ] Do users record mostly short (<10 min) or long (>30 min) sessions?
343+
- [ ] Is OpenAI Whisper accuracy good enough or do we need alternatives?
344+
- [ ] Do users want to edit prompts or are defaults sufficient?
345+
- [ ] Should we add speaker diarization (who said what)?
346+
347+
## Launch Plan
348+
349+
### Phase 1: Internal Testing (Week 1)
350+
- Build and test locally
351+
- Record real meetings
352+
- Validate transcription quality
353+
- Iterate on UI/UX
354+
355+
### Phase 2: Beta Users (Week 2)
356+
- Ship to 10-20 beta users
357+
- Gather feedback on accuracy and usefulness
358+
- Monitor error rates and API costs
359+
- Fix critical bugs
360+
361+
### Phase 3: General Availability (Week 3)
362+
- Announce in app
363+
- Update docs
364+
- Monitor metrics
365+
- Plan next iteration based on feedback
366+
367+
## Risk Mitigation
368+
369+
| Risk | Mitigation |
370+
|------|------------|
371+
| **Poor transcription accuracy** | Use OpenAI Whisper (best in class), allow editing |
372+
| **High API costs** | Manual transcription trigger, warn on large files |
373+
| **Browser recording fails** | Clear error messages, fallback instructions |
374+
| **Large file handling** | Show 25MB limit error, suggest shorter recordings |
375+
| **Privacy concerns** | Local storage, encrypted keys, clear data policy |
376+
| **Low adoption** | User education, in-app tutorial, example use cases |
377+
378+
## Appendix
379+
380+
### Competitive Analysis
381+
- **Otter.ai**: Strong transcription, expensive, no task extraction
382+
- **Fireflies.ai**: Meeting bot, not local, privacy concerns
383+
- **Notion AI**: Meeting notes, requires Notion, no direct recording
384+
- **Array advantage**: Integrated with task management, local-first, extensible
385+
386+
### Research & References
387+
- [Vercel AI SDK Docs](https://sdk.vercel.ai/docs)
388+
- [OpenAI Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
389+
- [MDN MediaRecorder API](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder)
390+
- [Electron safeStorage](https://www.electronjs.org/docs/latest/api/safe-storage)
391+
392+
---
393+
394+
**Document Status**: Draft v1.0
395+
**Last Updated**: 2025-10-15
396+
**Owner**: Product / Engineering
397+
**Next Review**: Post-launch + 2 weeks

0 commit comments

Comments
 (0)