Skip to content

Commit e64b1c9

Browse files
authored
feat(tools): added speech to text with openai whisper, elevenlabs, and deepgram (#2068)
* feat(tools): added speech to text with openai whisper, elevenlabs, and deepgram * added new file icons, implemented ffmpeg * updated docs * revert environment
1 parent 7c5d625 commit e64b1c9

File tree

27 files changed

+1884
-18
lines changed

27 files changed

+1884
-18
lines changed

apps/docs/components/icons.tsx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4084,3 +4084,27 @@ export function CalendlyIcon(props: SVGProps<SVGSVGElement>) {
40844084
</svg>
40854085
)
40864086
}
4087+
4088+
export function AudioWaveformIcon(props: SVGProps<SVGSVGElement>) {
4089+
return (
4090+
<svg
4091+
{...props}
4092+
xmlns='http://www.w3.org/2000/svg'
4093+
width='24'
4094+
height='24'
4095+
viewBox='0 0 24 24'
4096+
fill='none'
4097+
stroke='currentColor'
4098+
strokeWidth='2'
4099+
strokeLinecap='round'
4100+
strokeLinejoin='round'
4101+
>
4102+
<path d='M2 10v3' />
4103+
<path d='M6 6v11' />
4104+
<path d='M10 3v18' />
4105+
<path d='M14 8v7' />
4106+
<path d='M18 5v13' />
4107+
<path d='M22 10v3' />
4108+
</svg>
4109+
)
4110+
}

apps/docs/components/ui/icon-mapping.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import {
88
ApolloIcon,
99
ArxivIcon,
1010
AsanaIcon,
11+
AudioWaveformIcon,
1112
BrainIcon,
1213
BrowserUseIcon,
1314
CalendlyIcon,
@@ -100,6 +101,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
100101
telegram: TelegramIcon,
101102
tavily: TavilyIcon,
102103
supabase: SupabaseIcon,
104+
stt: AudioWaveformIcon,
103105
stripe: StripeIcon,
104106
stagehand_agent: StagehandIcon,
105107
stagehand: StagehandIcon,

apps/docs/content/docs/en/tools/calendly.mdx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,20 @@ import { BlockInfoCard } from "@/components/ui/block-info-card"
1010
color="#FFFFFF"
1111
/>
1212

13+
{/* MANUAL-CONTENT-START:intro */}
14+
[Calendly](https://calendly.com/) is a popular scheduling automation platform that helps you book meetings, events, and appointments with ease. With Calendly, teams and individuals can streamline scheduling, reduce back-and-forth emails, and automate tasks around events.
15+
16+
With the Sim Calendly integration, your agents can:
17+
18+
- **Retrieve information about your account and scheduled events**: Use tools to fetch user info, event types, and scheduled events for analysis or automation.
19+
- **Manage event types and scheduling**: Access and list available event types for users or organizations, retrieve details about specific event types, and monitor scheduled meetings and invitee data.
20+
- **Automate follow-ups and workflows**: When users schedule, reschedule, or cancel meetings, Sim agents can automatically trigger corresponding workflows—such as sending reminders, updating CRMs, or notifying participants.
21+
- **Integrate easily using webhooks**: Set up Sim workflows to respond to real-time Calendly webhook events, including when invitees schedule, cancel, or interact with routing forms.
22+
23+
Whether you want to automate meeting prep, manage invites, or run custom workflows in response to scheduling activity, the Calendly tools in Sim give you flexible and secure access. Unlock new automation by reacting instantly to scheduling changes—streamlining your team's operations and communications.
24+
{/* MANUAL-CONTENT-END */}
25+
26+
1327
## Usage Instructions
1428

1529
Integrate Calendly into your workflow. Manage event types, scheduled events, invitees, and webhooks. Can also trigger workflows based on Calendly webhook events (invitee scheduled, invitee canceled, routing form submitted). Requires Personal Access Token.

apps/docs/content/docs/en/tools/meta.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
"stagehand",
6262
"stagehand_agent",
6363
"stripe",
64+
"stt",
6465
"supabase",
6566
"tavily",
6667
"telegram",
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: Speech-to-Text
3+
description: Convert speech to text using AI
4+
---
5+
6+
import { BlockInfoCard } from "@/components/ui/block-info-card"
7+
8+
<BlockInfoCard
9+
type="stt"
10+
color="#181C1E"
11+
/>
12+
13+
{/* MANUAL-CONTENT-START:intro */}
14+
Transcribe speech to text using state-of-the-art AI models from leading providers. The Sim Speech-to-Text (STT) tools allow you to convert audio and video files into accurate transcripts, supporting multiple languages, timestamps, and optional translation.
15+
16+
Supported providers:
17+
18+
- **[OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text/overview)**: Advanced open-source STT model from OpenAI. Supports models such as `whisper-1` and handles a wide variety of languages and audio formats.
19+
- **[Deepgram](https://deepgram.com/)**: Real-time and batch STT API with deep learning models like `nova-3`, `nova-2`, and `whisper-large`. Offers features like diarization, intent recognition, and industry-specific tuning.
20+
- **[ElevenLabs](https://elevenlabs.io/)**: Known for high-quality speech AI, ElevenLabs provides STT models focused on accuracy and natural language understanding for numerous languages and dialects.
21+
22+
Choose the provider and model best suited to your task—whether fast, production-grade transcription (Deepgram), highly accurate multi-language capability (Whisper), or advanced understanding and language coverage (ElevenLabs).
23+
{/* MANUAL-CONTENT-END */}
24+
25+
26+
## Usage Instructions
27+
28+
Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization.
29+
30+
31+
32+
## Tools
33+
34+
### `stt_whisper`
35+
36+
Transcribe audio to text using OpenAI Whisper
37+
38+
#### Input
39+
40+
| Parameter | Type | Required | Description |
41+
| --------- | ---- | -------- | ----------- |
42+
| `provider` | string | Yes | STT provider \(whisper\) |
43+
| `apiKey` | string | Yes | OpenAI API key |
44+
| `model` | string | No | Whisper model to use \(default: whisper-1\) |
45+
| `audioFile` | file | No | Audio or video file to transcribe |
46+
| `audioFileReference` | file | No | Reference to audio/video file from previous blocks |
47+
| `audioUrl` | string | No | URL to audio or video file |
48+
| `language` | string | No | Language code \(e.g., "en", "es", "fr"\) or "auto" for auto-detection |
49+
| `timestamps` | string | No | Timestamp granularity: none, sentence, or word |
50+
| `translateToEnglish` | boolean | No | Translate audio to English |
51+
52+
#### Output
53+
54+
| Parameter | Type | Description |
55+
| --------- | ---- | ----------- |
56+
| `transcript` | string | Full transcribed text |
57+
| `segments` | array | Timestamped segments |
58+
| `language` | string | Detected or specified language |
59+
| `duration` | number | Audio duration in seconds |
60+
| `confidence` | number | Overall confidence score |
61+
62+
### `stt_deepgram`
63+
64+
Transcribe audio to text using Deepgram
65+
66+
#### Input
67+
68+
| Parameter | Type | Required | Description |
69+
| --------- | ---- | -------- | ----------- |
70+
| `provider` | string | Yes | STT provider \(deepgram\) |
71+
| `apiKey` | string | Yes | Deepgram API key |
72+
| `model` | string | No | Deepgram model to use \(nova-3, nova-2, whisper-large, etc.\) |
73+
| `audioFile` | file | No | Audio or video file to transcribe |
74+
| `audioFileReference` | file | No | Reference to audio/video file from previous blocks |
75+
| `audioUrl` | string | No | URL to audio or video file |
76+
| `language` | string | No | Language code \(e.g., "en", "es", "fr"\) or "auto" for auto-detection |
77+
| `timestamps` | string | No | Timestamp granularity: none, sentence, or word |
78+
| `diarization` | boolean | No | Enable speaker diarization |
79+
80+
#### Output
81+
82+
| Parameter | Type | Description |
83+
| --------- | ---- | ----------- |
84+
| `transcript` | string | Full transcribed text |
85+
| `segments` | array | Timestamped segments with speaker labels |
86+
| `language` | string | Detected or specified language |
87+
| `duration` | number | Audio duration in seconds |
88+
| `confidence` | number | Overall confidence score |
89+
90+
### `stt_elevenlabs`
91+
92+
Transcribe audio to text using ElevenLabs
93+
94+
#### Input
95+
96+
| Parameter | Type | Required | Description |
97+
| --------- | ---- | -------- | ----------- |
98+
| `provider` | string | Yes | STT provider \(elevenlabs\) |
99+
| `apiKey` | string | Yes | ElevenLabs API key |
100+
| `model` | string | No | ElevenLabs model to use \(scribe_v1, scribe_v1_experimental\) |
101+
| `audioFile` | file | No | Audio or video file to transcribe |
102+
| `audioFileReference` | file | No | Reference to audio/video file from previous blocks |
103+
| `audioUrl` | string | No | URL to audio or video file |
104+
| `language` | string | No | Language code \(e.g., "en", "es", "fr"\) or "auto" for auto-detection |
105+
| `timestamps` | string | No | Timestamp granularity: none, sentence, or word |
106+
107+
#### Output
108+
109+
| Parameter | Type | Description |
110+
| --------- | ---- | ----------- |
111+
| `transcript` | string | Full transcribed text |
112+
| `segments` | array | Timestamped segments |
113+
| `language` | string | Detected or specified language |
114+
| `duration` | number | Audio duration in seconds |
115+
| `confidence` | number | Overall confidence score |
116+
117+
118+
119+
## Notes
120+
121+
- Category: `tools`
122+
- Type: `stt`

apps/sim/app/api/files/upload/route.ts

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,21 +13,37 @@ import {
1313
} from '@/app/api/files/utils'
1414

1515
const ALLOWED_EXTENSIONS = new Set([
16+
// Documents
1617
'pdf',
1718
'doc',
1819
'docx',
1920
'txt',
2021
'md',
21-
'png',
22-
'jpg',
23-
'jpeg',
24-
'gif',
2522
'csv',
2623
'xlsx',
2724
'xls',
2825
'json',
2926
'yaml',
3027
'yml',
28+
// Images
29+
'png',
30+
'jpg',
31+
'jpeg',
32+
'gif',
33+
// Audio
34+
'mp3',
35+
'm4a',
36+
'wav',
37+
'webm',
38+
'ogg',
39+
'flac',
40+
'aac',
41+
'opus',
42+
// Video
43+
'mp4',
44+
'mov',
45+
'avi',
46+
'mkv',
3147
])
3248

3349
function validateFileExtension(filename: string): boolean {

0 commit comments

Comments
 (0)