|
| 1 | +--- |
| 2 | +title: Speech-to-Text |
| 3 | +description: Convert speech to text using AI |
| 4 | +--- |
| 5 | + |
| 6 | +import { BlockInfoCard } from "@/components/ui/block-info-card" |
| 7 | + |
| 8 | +<BlockInfoCard |
| 9 | + type="stt" |
| 10 | + color="#181C1E" |
| 11 | +/> |
| 12 | + |
| 13 | +{/* MANUAL-CONTENT-START:intro */} |
| 14 | +Transcribe speech to text using state-of-the-art AI models from leading providers. The Sim Speech-to-Text (STT) tools allow you to convert audio and video files into accurate transcripts, supporting multiple languages, timestamps, and optional translation. |
| 15 | + |
| 16 | +Supported providers: |
| 17 | + |
| 18 | +- **[OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text/overview)**: Advanced open-source STT model from OpenAI. Supports models such as `whisper-1` and handles a wide variety of languages and audio formats. |
| 19 | +- **[Deepgram](https://deepgram.com/)**: Real-time and batch STT API with deep learning models like `nova-3`, `nova-2`, and `whisper-large`. Offers features like diarization, intent recognition, and industry-specific tuning. |
| 20 | +- **[ElevenLabs](https://elevenlabs.io/)**: Known for high-quality speech AI, ElevenLabs provides STT models focused on accuracy and natural language understanding for numerous languages and dialects. |
| 21 | + |
| 22 | +Choose the provider and model best suited to your task—whether fast, production-grade transcription (Deepgram), highly accurate multi-language capability (Whisper), or advanced understanding and language coverage (ElevenLabs). |
| 23 | +{/* MANUAL-CONTENT-END */} |
| 24 | + |
| 25 | + |
| 26 | +## Usage Instructions |
| 27 | + |
| 28 | +Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization. |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | +## Tools |
| 33 | + |
| 34 | +### `stt_whisper` |
| 35 | + |
| 36 | +Transcribe audio to text using OpenAI Whisper |
| 37 | + |
| 38 | +#### Input |
| 39 | + |
| 40 | +| Parameter | Type | Required | Description | |
| 41 | +| --------- | ---- | -------- | ----------- | |
| 42 | +| `provider` | string | Yes | STT provider \(whisper\) | |
| 43 | +| `apiKey` | string | Yes | OpenAI API key | |
| 44 | +| `model` | string | No | Whisper model to use \(default: whisper-1\) | |
| 45 | +| `audioFile` | file | No | Audio or video file to transcribe | |
| 46 | +| `audioFileReference` | file | No | Reference to audio/video file from previous blocks | |
| 47 | +| `audioUrl` | string | No | URL to audio or video file | |
| 48 | +| `language` | string | No | Language code \(e.g., "en", "es", "fr"\) or "auto" for auto-detection | |
| 49 | +| `timestamps` | string | No | Timestamp granularity: none, sentence, or word | |
| 50 | +| `translateToEnglish` | boolean | No | Translate audio to English | |
| 51 | + |
| 52 | +#### Output |
| 53 | + |
| 54 | +| Parameter | Type | Description | |
| 55 | +| --------- | ---- | ----------- | |
| 56 | +| `transcript` | string | Full transcribed text | |
| 57 | +| `segments` | array | Timestamped segments | |
| 58 | +| `language` | string | Detected or specified language | |
| 59 | +| `duration` | number | Audio duration in seconds | |
| 60 | +| `confidence` | number | Overall confidence score | |
| 61 | + |
| 62 | +### `stt_deepgram` |
| 63 | + |
| 64 | +Transcribe audio to text using Deepgram |
| 65 | + |
| 66 | +#### Input |
| 67 | + |
| 68 | +| Parameter | Type | Required | Description | |
| 69 | +| --------- | ---- | -------- | ----------- | |
| 70 | +| `provider` | string | Yes | STT provider \(deepgram\) | |
| 71 | +| `apiKey` | string | Yes | Deepgram API key | |
| 72 | +| `model` | string | No | Deepgram model to use \(nova-3, nova-2, whisper-large, etc.\) | |
| 73 | +| `audioFile` | file | No | Audio or video file to transcribe | |
| 74 | +| `audioFileReference` | file | No | Reference to audio/video file from previous blocks | |
| 75 | +| `audioUrl` | string | No | URL to audio or video file | |
| 76 | +| `language` | string | No | Language code \(e.g., "en", "es", "fr"\) or "auto" for auto-detection | |
| 77 | +| `timestamps` | string | No | Timestamp granularity: none, sentence, or word | |
| 78 | +| `diarization` | boolean | No | Enable speaker diarization | |
| 79 | + |
| 80 | +#### Output |
| 81 | + |
| 82 | +| Parameter | Type | Description | |
| 83 | +| --------- | ---- | ----------- | |
| 84 | +| `transcript` | string | Full transcribed text | |
| 85 | +| `segments` | array | Timestamped segments with speaker labels | |
| 86 | +| `language` | string | Detected or specified language | |
| 87 | +| `duration` | number | Audio duration in seconds | |
| 88 | +| `confidence` | number | Overall confidence score | |
| 89 | + |
| 90 | +### `stt_elevenlabs` |
| 91 | + |
| 92 | +Transcribe audio to text using ElevenLabs |
| 93 | + |
| 94 | +#### Input |
| 95 | + |
| 96 | +| Parameter | Type | Required | Description | |
| 97 | +| --------- | ---- | -------- | ----------- | |
| 98 | +| `provider` | string | Yes | STT provider \(elevenlabs\) | |
| 99 | +| `apiKey` | string | Yes | ElevenLabs API key | |
| 100 | +| `model` | string | No | ElevenLabs model to use \(scribe_v1, scribe_v1_experimental\) | |
| 101 | +| `audioFile` | file | No | Audio or video file to transcribe | |
| 102 | +| `audioFileReference` | file | No | Reference to audio/video file from previous blocks | |
| 103 | +| `audioUrl` | string | No | URL to audio or video file | |
| 104 | +| `language` | string | No | Language code \(e.g., "en", "es", "fr"\) or "auto" for auto-detection | |
| 105 | +| `timestamps` | string | No | Timestamp granularity: none, sentence, or word | |
| 106 | + |
| 107 | +#### Output |
| 108 | + |
| 109 | +| Parameter | Type | Description | |
| 110 | +| --------- | ---- | ----------- | |
| 111 | +| `transcript` | string | Full transcribed text | |
| 112 | +| `segments` | array | Timestamped segments | |
| 113 | +| `language` | string | Detected or specified language | |
| 114 | +| `duration` | number | Audio duration in seconds | |
| 115 | +| `confidence` | number | Overall confidence score | |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +## Notes |
| 120 | + |
| 121 | +- Category: `tools` |
| 122 | +- Type: `stt` |
0 commit comments