Skip to content

Commit 671ce66

Browse files
Merge pull request #354 from eric-urban/eur/whisper-update
clarify batch usage
2 parents 3423058 + 608aa62 commit 671ce66

File tree

1 file changed

+10
-11
lines changed

1 file changed

+10
-11
lines changed

articles/ai-services/speech-service/whisper-overview.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,21 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: overview
9-
ms.date: 8/20/2024
9+
ms.date: 9/18/2024
1010
ms.author: eur
1111
---
1212

1313
# What is the Whisper model?
1414

1515
The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English. The model can also be used to transcribe audio files that contain speech in other languages. The output of the model is English text.
1616

17-
Whisper models are available via the Azure OpenAI Service or via Azure AI Speech. The features differ for those offerings. In Azure AI Speech, Whisper is just one of several models that you can use for speech to text.
17+
Whisper models are available via the Azure OpenAI Service or via Azure AI Speech. The features differ for those offerings. In [Azure AI Speech (batch transcription)](./batch-transcription-create.md#use-a-whisper-model), Whisper is just one of several models that you can use for speech to text.
1818

1919
You might ask:
2020

2121
- Is the Whisper Model a good choice for my scenario, or is an Azure AI Speech model better? What are the API comparisons between the two types of models?
2222

23-
- If I want to use the Whisper Model, should I use it via the Azure OpenAI Service or via Azure AI Speech? What are the scenarios that guide me to use one or the other?
23+
- If I want to use the Whisper Model, should I use it via the Azure OpenAI Service or via Azure AI Speech ? What are the scenarios that guide me to use one or the other?
2424

2525
## Whisper model or Azure AI Speech models
2626

@@ -29,7 +29,7 @@ Either the Whisper model or the Azure AI Speech models are appropriate depending
2929
| Scenario | Whisper model | Azure AI Speech models |
3030
|---------|---------------|------------------------|
3131
| Real-time transcriptions, captions, and subtitles for audio and video. | Not available | Recommended |
32-
| Transcriptions, captions, and subtitles for prerecorded audio and video. | The Whisper model via [Azure OpenAI](../openai/whisper-quickstart.md) is recommended for fast processing of individual audio files. The Whisper model via [Azure AI Speech](./batch-transcription-create.md#use-a-whisper-model) is recommended for batch processing of large files. For more information, see [Whisper model via Azure AI Speech or via Azure OpenAI Service?](#whisper-model-via-azure-ai-speech-or-via-azure-openai-service) | Recommended for batch processing of large files, diarization, and word level timestamps. |
32+
| Transcriptions, captions, and subtitles for prerecorded audio and video. | The Whisper model via [Azure OpenAI](../openai/whisper-quickstart.md) is recommended for fast processing of individual audio files. The Whisper model via [Azure AI Speech (batch transcription)](./batch-transcription-create.md#use-a-whisper-model) is recommended for batch processing of large files. For more information, see [Whisper model via Azure AI Speech batch transcription or via Azure OpenAI Service?](#whisper-model-via-azure-ai-speech-or-via-azure-openai-service) | Recommended for batch processing of large files, diarization, and word level timestamps. |
3333
| Transcript of phone call recordings and analytics such as call summary, sentiment, key topics, and custom insights. | Available | Recommended |
3434
| Real-time transcription and analytics to assist call center agents with customer questions. | Not available | Recommended |
3535
| Transcript of meeting recordings and analytics such as meeting summary, meeting chapters, and action item extraction. | Available | Recommended |
@@ -43,27 +43,26 @@ Either the Whisper model or the Azure AI Speech models are appropriate depending
4343

4444
## Whisper model via Azure AI Speech or via Azure OpenAI Service?
4545

46-
If you decide to use the Whisper model, you have two options. You can choose whether to use the Whisper Model via [Azure OpenAI](../openai/whisper-quickstart.md) or via [Azure AI Speech](./batch-transcription-create.md#use-a-whisper-model). In either case, the readability of the transcribed text is the same. You can input mixed language audio and the output is in English.
46+
If you decide to use the Whisper model, you have two options. You can choose whether to use the Whisper Model via [Azure OpenAI](../openai/whisper-quickstart.md) or via [Azure AI Speech (batch transcription)](./batch-transcription-create.md#use-a-whisper-model). In either case, the readability of the transcribed text is the same. You can input mixed language audio and the output is in English.
4747

4848
Whisper Model via Azure OpenAI Service might be best for:
4949
- Quickly transcribing audio files one at a time
5050
- Translate audio from other languages into English
5151
- Provide a prompt to the model to guide the output
5252
- Supported file formats: mp3, mp4, mpweg, mpga, m4a, wav, and webm
5353

54-
Whisper Model via Azure AI Speech might be best for:
54+
Whisper Model via Azure AI Speech batch transcription might be best for:
5555
- Transcribing files larger than 25MB (up to 1GB). The file size limit for the Azure OpenAI Whisper model is 25 MB.
56-
- Transcribing large batches of audio files
56+
- Transcribing large batches of audio files.
5757
- Diarization to distinguish between the different speakers participating in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech. The Whisper model via Azure OpenAI doesn't support diarization.
5858
- Word-level timestamps
59-
- Supported file formats: mp3, wav, and ogg
60-
- Customization of the Whisper base model to improve accuracy for your scenario (coming soon)
59+
- Supported file formats: mp3, wav, and ogg.
6160

6261
Regional support is another consideration.
63-
- The Whisper model via Azure OpenAI Service is available in the following regions: East US 2, India South, North Central, Norway East, Sweden Central, and West Europe.
62+
- The Whisper model via Azure OpenAI Service is available in the following regions: East US 2, India South, North Central, Norway East, Sweden Central, Switzerland North, and West Europe.
6463
- The Whisper model via Azure AI Speech is available in the following regions: Australia East, East US, North Central US, South Central US, Southeast Asia, UK South, and West Europe.
6564

66-
## Next steps
65+
## Related content
6766

6867
- [Use Whisper models via the Azure AI Speech batch transcription API](./batch-transcription-create.md#use-a-whisper-model)
6968
- [Try the speech to text quickstart for Whisper via Azure OpenAI](../openai/whisper-quickstart.md)

0 commit comments

Comments
 (0)