Skip to content

Commit 3c920e2

Browse files
Merge pull request #3240 from eric-urban/eur/whisper
whisper overview edits
2 parents 5b74de6 + 54f4b1a commit 3c920e2

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

articles/ai-services/speech-service/whisper-overview.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ ms.author: eur
1313

1414
# What is the Whisper model?
1515

16-
The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English. The model can also be used to transcribe audio files that contain speech in other languages. The output of the model is English text.
16+
The Whisper model is a speech to text model from OpenAI that you can use to transcribe or translate audio files. The model is trained on a large dataset of English audio and text.
17+
- The model is optimized for transcribing audio files that contain speech in English.
18+
- The model can also be used to translate audio files that contain speech in other languages. The output of the transcription is English text.
1719

1820
Whisper models are available via the Azure OpenAI Service or via Azure AI Speech. The features differ for those offerings. In [Azure AI Speech (batch transcription)](./batch-transcription-create.md#use-a-whisper-model), Whisper is just one of several models that you can use for speech to text.
1921

@@ -38,20 +40,20 @@ Either the Whisper model or the Azure AI Speech models are appropriate depending
3840
| Contact center voice agent: Call routing and interactive voice response for call centers.​ | Available | Recommended |
3941
| Voice assistant: Application specific voice assistant for a set-top box, mobile app, in-car, and other scenarios. | Available | Recommended |
4042
| Pronunciation assessment: Assess the pronunciation of a speaker's voice. | Not available | Recommended |
41-
| Translate live audio from one language to another. | Not available | Recommended via the [speech translation API](./speech-translation.md) |
42-
| Translate prerecorded audio from other languages into English. | Recommended | Available via the [speech translation API](./speech-translation.md) |
43-
| Translate prerecorded audio into languages other than English. | Not available | Recommended via the [speech translation API](./speech-translation.md) |
43+
| Translate live audio from one language to another. | Not available | Recommended via the [speech translation API](./speech-translation.md). |
44+
| Translate prerecorded audio from other languages into English. | Recommended | Also available via the [speech translation API](./speech-translation.md). |
45+
| Translate prerecorded audio into languages other than English. | Not available | Recommended via the [speech translation API](./speech-translation.md). |
4446

4547
## Whisper model via Azure AI Speech or via Azure OpenAI Service?
4648

47-
If you decide to use the Whisper model, you have two options. You can choose whether to use the Whisper Model via [Azure OpenAI](../openai/whisper-quickstart.md) or via [Azure AI Speech (batch transcription)](./batch-transcription-create.md#use-a-whisper-model). In either case, the readability of the transcribed text is the same. You can input mixed language audio and the output is in English.
49+
If you decide to use the Whisper model, you have two options. You can choose whether to use the Whisper Model via [Azure OpenAI Service](../openai/whisper-quickstart.md) or via [Azure AI Speech (batch transcription)](./batch-transcription-create.md#use-a-whisper-model). In either case, the readability of the transcribed text is the same.
4850

4951
Whisper Model via Azure OpenAI Service might be best for:
50-
- Quickly transcribing audio files one at a time
51-
- Translate audio from other languages into English
52-
- Provide a prompt to the model to guide the output
53-
- Supported file formats: mp3, mp4, mpweg, mpga, m4a, wav, and webm
54-
- Only ASCII character supported for filename
52+
- Quickly transcribing audio files one at a time.
53+
- Translate audio from other languages into English. You can input mixed language audio and the output is in English.
54+
- Provide a prompt to the model to guide the output.
55+
- Supported file formats: mp3, mp4, mpweg, mpga, m4a, wav, and webm.
56+
- Only ASCII character supported for filename.
5557

5658
Whisper Model via Azure AI Speech batch transcription might be best for:
5759
- Transcribing files larger than 25MB (up to 1GB). The file size limit for the Azure OpenAI Whisper model is 25 MB.

0 commit comments

Comments
 (0)