Skip to content

Commit c700890

Browse files
jkomorosclaudehappy-otterbfollington
authored
Add ct-voice-input component with real-time audio visualization (commontoolsinc#2138)
* Add ct-voice-input component design document This design document specifies a voice recording and transcription component for CommonTools patterns. The component enables: - Voice recording with MediaRecorder API - Real-time waveform visualization - Automatic transcription via API integration - Reactive cell binding for pattern integration - Two-component architecture (ct-voice-input + ct-audio-visualizer) The design includes a phased implementation roadmap starting with v1 MVP focusing on core recording and transcription functionality, with future versions adding visual polish and advanced features. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Add ct-voice-input and ct-audio-visualizer components Implements voice recording and transcription components for CommonTools patterns, following the design in docs/specs/ct-voice-input-design.md. **ct-voice-input** - Main voice input component featuring: - MediaRecorder integration for audio capture - Microphone permission handling - Cell binding via CellController for reactive integration - Recording modes (hold-to-record and toggle) - Duration tracking with max duration limits - Error handling for permission and device issues - Transcription API integration (placeholder) - Event emission for all recording lifecycle events **ct-audio-visualizer** - Waveform visualization component: - Real-time audio frequency visualization using Web Audio API - SVG-based rendering for flexibility and styling - Configurable bar count, color, and height - Automatic resource cleanup on disconnect Both components follow Common UI v2 patterns: - Extend BaseElement - Use theme context for styling - Include box-sizing resets - Export types separately - Proper JSDoc documentation This is the v1 MVP implementation focusing on core functionality. Future enhancements will include: - Discord-style expansion animations - Audio format conversion (WebM → WAV) - Real transcription API integration - Playback functionality - Message bubble states packages/ui/src/v2/components/ct-voice-input/ct-voice-input.ts:614 packages/ui/src/v2/components/ct-audio-visualizer/ct-audio-visualizer.ts:169 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Add voice-note test pattern and audio conversion utilities **voice-note.tsx pattern** - Demo pattern showcasing voice input: - Hold-to-record voice transcription - Saves transcribed notes to a list - Shows latest transcription with duration - Delete functionality for saved notes - Clean UI with ct-card and ct-vstack layout **audio-conversion.ts** - Complete WebM to WAV conversion: - Uses Web Audio API for decoding - Resamples to target sample rate (16kHz for transcription) - Converts stereo to mono - Encodes as 16-bit PCM WAV format - Proper WAV header generation - Linear interpolation for resampling **ct-voice-input updates**: - Integrated real audio conversion utility - Graceful fallback if conversion fails - Optimized for FAL AI Wizper transcription API The transcription API endpoint already exists at /api/ai/voice/transcribe and is fully functional with caching, FAL AI Wizper integration, and support for timestamped chunks. Component is now fully functional end-to-end! packages/patterns/voice-note.tsx:166 packages/ui/src/v2/utils/audio-conversion.ts:203 packages/ui/src/v2/components/ct-voice-input/ct-voice-input.ts:618 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Add JSX type definitions for voice input components - Add CTVoiceInputElement and CTAudioVisualizerElement interfaces - Add CTVoiceInputAttributes with all properties and event handlers - Add CTAudioVisualizerAttributes for the visualizer subcomponent - Add IntrinsicElements entries for both components - Create voice-note-simple.tsx demo pattern - Update voice-note.tsx with handler fixes This enables TypeScript compilation of patterns using the new ct-voice-input and ct-audio-visualizer components in JSX. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Fix ct-audio-visualizer animation during recording The audio visualizer wasn't animating during voice recording, providing no visual feedback to users. This fixes the issue by: 1. Using Lit's ref directive instead of querySelector for reliable element access 2. Waiting for updateComplete to ensure the element is rendered before accessing it 3. Skipping the first 2 frequency bins to avoid low-frequency noise dominating the visualization This results in a smooth, balanced waveform animation that provides clear real-time feedback during recording. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Move ct-voice-input spec to component directory Following the pattern used by ct-outliner, move the design spec into the component directory itself rather than docs/specs/. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Address PR review feedback - Reuse AudioContext per element instead of creating new ones (jsantell) - Remove unused renderMode property from ct-audio-visualizer - Fix Content-Type header to use actual blob type when WAV conversion fails - Update spec.md to reflect actual implementation (duration parameter) - Fix delete button in voice-note pattern by passing noteId through context Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Add comprehensive tests for audio conversion utilities Tests cover: - WAV file format structure validation - Float to 16-bit PCM conversion with clamping - Audio resampling with linear interpolation - Stereo to mono channel mixing - DataView string writing for WAV headers All tests pass successfully. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Improve audio conversion tests to use actual exported functions Address cubic-dev-ai feedback: - Export helper functions (resample, floatTo16BitPCM, createWavFile) for testing - Update tests to call actual functions instead of reimplementing logic - Fix "sample rates match" test to actually call resample() function - All tests now provide regression coverage for the real utilities All 12 test steps still pass. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]> * Lint and format * Remove `spec.md` * Fix compile errors * Remove `MouseEvent` type --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Happy <[email protected]> Co-authored-by: Ben Follington <[email protected]>
1 parent 963575c commit c700890

File tree

10 files changed

+1581
-0
lines changed

10 files changed

+1581
-0
lines changed

packages/html/src/jsx.d.ts

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2906,6 +2906,8 @@ interface CTDraggableElement extends CTHTMLElement {}
29062906
interface CTPlaidLinkElement extends CTHTMLElement {}
29072907
interface CTCharmElement extends CTHTMLElement {}
29082908
interface CTIFrameElement extends CTHTMLElement {}
2909+
interface CTVoiceInputElement extends CTHTMLElement {}
2910+
interface CTAudioVisualizerElement extends CTHTMLElement {}
29092911

29102912
interface CTDraggableAttributes<T> extends CTHTMLAttributes<T> {
29112913
"key"?: number;
@@ -2941,6 +2943,38 @@ interface CTCharmAttributes<T> extends CTHTMLAttributes<T> {
29412943
"space-name"?: string;
29422944
}
29432945

2946+
interface CTVoiceInputAttributes<T> extends CTHTMLAttributes<T> {
2947+
"$transcription"?: CellLike<any>;
2948+
"recordingMode"?: "hold" | "toggle";
2949+
"autoTranscribe"?: boolean;
2950+
"maxDuration"?: number;
2951+
"showWaveform"?: boolean;
2952+
"disabled"?: boolean;
2953+
"barCount"?: number;
2954+
"barWidth"?: number;
2955+
"barGap"?: number;
2956+
"minHeight"?: number;
2957+
"maxHeight"?: number;
2958+
"visualizerColor"?: string;
2959+
"smoothing"?: number;
2960+
"onct-transcription-complete"?: EventHandler<any>;
2961+
"onct-transcription-error"?: EventHandler<any>;
2962+
"onct-recording-start"?: EventHandler<any>;
2963+
"onct-recording-stop"?: EventHandler<any>;
2964+
"onct-error"?: EventHandler<any>;
2965+
"onct-change"?: EventHandler<any>;
2966+
}
2967+
2968+
interface CTAudioVisualizerAttributes<T> extends CTHTMLAttributes<T> {
2969+
"barCount"?: number;
2970+
"barWidth"?: number;
2971+
"barGap"?: number;
2972+
"minHeight"?: number;
2973+
"maxHeight"?: number;
2974+
"color"?: string;
2975+
"smoothing"?: number;
2976+
}
2977+
29442978
interface CTChatAttributes<T> extends CTHTMLAttributes<T> {
29452979
"$messages"?: CellLike<any>;
29462980
"pending"?: boolean;
@@ -3889,6 +3923,14 @@ declare global {
38893923
CTCharmAttributes<CTCharmElement>,
38903924
CTCharmElement
38913925
>;
3926+
"ct-voice-input": CTDOM.DetailedHTMLProps<
3927+
CTVoiceInputAttributes<CTVoiceInputElement>,
3928+
CTVoiceInputElement
3929+
>;
3930+
"ct-audio-visualizer": CTDOM.DetailedHTMLProps<
3931+
CTAudioVisualizerAttributes<CTAudioVisualizerElement>,
3932+
CTAudioVisualizerElement
3933+
>;
38923934
"ct-fragment": CTDOM.DetailedHTMLProps<
38933935
CTHTMLAttributes<CTFragmentElement>,
38943936
CTFragmentElement
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
/// <cts-enable />
2+
import {
3+
Cell,
4+
cell,
5+
computed,
6+
type Default,
7+
NAME,
8+
recipe,
9+
UI,
10+
} from "commontools";
11+
12+
// Type definition for transcription data (from ct-voice-input component)
13+
interface TranscriptionChunk {
14+
timestamp: [number, number];
15+
text: string;
16+
}
17+
18+
interface TranscriptionData {
19+
id: string;
20+
text: string;
21+
chunks?: TranscriptionChunk[];
22+
audioData?: string;
23+
duration: number;
24+
timestamp: number;
25+
}
26+
27+
type Input = {
28+
title?: Cell<Default<string, "Voice Note Test">>;
29+
};
30+
31+
type Output = {
32+
transcription: Default<TranscriptionData | null, null>;
33+
};
34+
35+
const VoiceNoteSimple = recipe<Input, Output>(
36+
"Voice Note Simple",
37+
({ title }) => {
38+
const transcription = cell<TranscriptionData | null>(null);
39+
const hasTranscription = computed(() => transcription.get() !== null);
40+
const transcriptionText = computed(() => transcription.get()?.text || "");
41+
const transcriptionDuration = computed(
42+
() => transcription.get()?.duration || 0,
43+
);
44+
const transcriptionTimestamp = computed(
45+
() => transcription.get()?.timestamp || Date.now(),
46+
);
47+
48+
return {
49+
[NAME]: title,
50+
[UI]: (
51+
<ct-screen>
52+
<div slot="header">
53+
<ct-input
54+
$value={title}
55+
placeholder="Voice Note Test"
56+
/>
57+
</div>
58+
59+
<ct-vstack gap="3" style="padding: 1rem; max-width: 600px;">
60+
<ct-card>
61+
<div style={{ padding: "1rem" }}>
62+
<h3 style={{ marginTop: 0 }}>Voice Input Component Test</h3>
63+
<p style={{ color: "var(--ct-color-gray-600)" }}>
64+
Hold the microphone button to record. Release to transcribe.
65+
</p>
66+
67+
<ct-voice-input
68+
$transcription={transcription}
69+
recordingMode="hold"
70+
autoTranscribe
71+
maxDuration={60}
72+
showWaveform
73+
/>
74+
</div>
75+
</ct-card>
76+
77+
{hasTranscription && (
78+
<ct-card>
79+
<div style={{ padding: "1rem" }}>
80+
<h3 style={{ marginTop: 0 }}>Latest Transcription</h3>
81+
<p style={{ margin: "1rem 0" }}>{transcriptionText}</p>
82+
<div
83+
style={{
84+
display: "flex",
85+
gap: "1rem",
86+
fontSize: "0.875rem",
87+
color: "var(--ct-color-gray-600)",
88+
}}
89+
>
90+
<span>
91+
Duration: {transcriptionDuration.toFixed(1)}s
92+
</span>
93+
<span>
94+
Recorded:{" "}
95+
{new Date(transcriptionTimestamp).toLocaleTimeString()}
96+
</span>
97+
</div>
98+
</div>
99+
</ct-card>
100+
)}
101+
</ct-vstack>
102+
</ct-screen>
103+
),
104+
transcription,
105+
};
106+
},
107+
);
108+
109+
export default VoiceNoteSimple;

packages/patterns/voice-note.tsx

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
/// <cts-enable />
2+
import {
3+
Cell,
4+
cell,
5+
computed,
6+
type Default,
7+
handler,
8+
NAME,
9+
recipe,
10+
UI,
11+
} from "commontools";
12+
13+
// Type definition for transcription data (from ct-voice-input component)
14+
interface TranscriptionChunk {
15+
timestamp: [number, number];
16+
text: string;
17+
}
18+
19+
interface TranscriptionData {
20+
id: string;
21+
text: string;
22+
chunks?: TranscriptionChunk[];
23+
audioData?: string;
24+
duration: number;
25+
timestamp: number;
26+
}
27+
28+
type Input = {
29+
title?: Cell<Default<string, "Voice Note">>;
30+
};
31+
32+
type Output = {
33+
transcription: Default<TranscriptionData | null, null>;
34+
notes: Default<TranscriptionData[], []>;
35+
};
36+
37+
const handleTranscriptionComplete = handler<
38+
{ detail: { transcription: TranscriptionData } },
39+
{ notes: Cell<TranscriptionData[]> }
40+
>(({ detail }, { notes }) => {
41+
// Add the transcription to our notes list
42+
notes.push(detail.transcription);
43+
});
44+
45+
const handleDeleteNote = handler<
46+
undefined,
47+
{ noteId: string; notes: Cell<TranscriptionData[]> }
48+
>((_, { noteId, notes }) => {
49+
const currentNotes = notes.get();
50+
const filtered = currentNotes.filter((note) => note.id !== noteId);
51+
notes.set(filtered);
52+
});
53+
54+
const VoiceNote = recipe<Input, Output>(
55+
"Voice Note",
56+
({ title }) => {
57+
const transcription = cell<TranscriptionData | null>(null);
58+
const notes = cell<TranscriptionData[]>([]);
59+
60+
// Computed values for type-safe JSX access
61+
const hasTranscription = computed(() => transcription.get() !== null);
62+
const transcriptionText = computed(() => transcription.get()?.text || "");
63+
const transcriptionDuration = computed(
64+
() => transcription.get()?.duration || 0,
65+
);
66+
const notesCount = computed(() => notes.get().length);
67+
const hasNotes = computed(() => notes.get().length > 0);
68+
69+
return {
70+
[NAME]: title,
71+
[UI]: (
72+
<ct-screen>
73+
<div slot="header">
74+
<ct-input
75+
$value={title}
76+
placeholder="Voice Note"
77+
readonly
78+
/>
79+
</div>
80+
81+
<ct-vstack gap="3">
82+
<ct-card>
83+
<div style={{ padding: "1rem" }}>
84+
<h3 style={{ marginTop: 0 }}>Record a Voice Note</h3>
85+
<p style={{ color: "var(--ct-color-gray-600)" }}>
86+
Hold the microphone button to record. Release to transcribe.
87+
</p>
88+
89+
<ct-voice-input
90+
$transcription={transcription}
91+
recordingMode="hold"
92+
autoTranscribe
93+
maxDuration={120}
94+
showWaveform
95+
onct-transcription-complete={handleTranscriptionComplete({
96+
notes,
97+
})}
98+
/>
99+
100+
{hasTranscription && (
101+
<div
102+
style={{
103+
marginTop: "1rem",
104+
padding: "1rem",
105+
backgroundColor: "var(--ct-color-blue-50)",
106+
borderRadius: "0.375rem",
107+
}}
108+
>
109+
<strong>Latest Transcription:</strong>
110+
<p>{transcriptionText}</p>
111+
<small style={{ color: "var(--ct-color-gray-600)" }}>
112+
Duration: {transcriptionDuration.toFixed(1)}s
113+
</small>
114+
</div>
115+
)}
116+
</div>
117+
</ct-card>
118+
119+
<ct-card>
120+
<div style={{ padding: "1rem" }}>
121+
<h3 style={{ marginTop: 0 }}>
122+
Saved Notes ({notesCount})
123+
</h3>
124+
125+
{!hasNotes
126+
? (
127+
<p style={{ color: "var(--ct-color-gray-500)" }}>
128+
No voice notes yet. Record one above!
129+
</p>
130+
)
131+
: (
132+
<ct-vstack gap="2">
133+
{notes.map((note) => (
134+
<div
135+
style={{
136+
padding: "0.75rem",
137+
border: "1px solid var(--ct-color-gray-200)",
138+
borderRadius: "0.375rem",
139+
position: "relative",
140+
}}
141+
>
142+
<div
143+
style={{
144+
display: "flex",
145+
justifyContent: "space-between",
146+
alignItems: "flex-start",
147+
gap: "0.5rem",
148+
}}
149+
>
150+
<div style={{ flex: 1 }}>
151+
<p style={{ margin: "0 0 0.5rem 0" }}>
152+
{note.text}
153+
</p>
154+
<small
155+
style={{
156+
color: "var(--ct-color-gray-600)",
157+
display: "block",
158+
}}
159+
>
160+
{new Date(note.timestamp).toLocaleString()} ·
161+
{" "}
162+
{note.duration.toFixed(1)}s
163+
</small>
164+
</div>
165+
<ct-button
166+
variant="ghost"
167+
size="sm"
168+
onClick={handleDeleteNote({
169+
noteId: note.id,
170+
notes,
171+
})}
172+
>
173+
×
174+
</ct-button>
175+
</div>
176+
</div>
177+
))}
178+
</ct-vstack>
179+
)}
180+
</div>
181+
</ct-card>
182+
</ct-vstack>
183+
</ct-screen>
184+
),
185+
transcription,
186+
notes,
187+
};
188+
},
189+
);
190+
191+
export default VoiceNote;

0 commit comments

Comments
 (0)