|
| 1 | +# Azure AI Speech Transcription client library for Java |
| 2 | + |
| 3 | +The Azure AI Speech Transcription client library provides a simple and efficient way to convert audio to text using Azure Cognitive Services. This library enables you to transcribe audio with features like speaker diarization, profanity filtering, and phrase hints for improved accuracy. |
| 4 | + |
| 5 | +## Documentation |
| 6 | + |
| 7 | +Various documentation is available to help you get started: |
| 8 | + |
| 9 | +- [API reference documentation][docs] |
| 10 | +- [Product documentation][product_documentation] |
| 11 | +- [Azure Speech Service documentation](https://learn.microsoft.com/azure/ai-services/speech-service/) |
| 12 | + |
| 13 | +## Getting started |
| 14 | + |
| 15 | +### Prerequisites |
| 16 | + |
| 17 | +- [Java Development Kit (JDK)][jdk] with version 8 or above |
| 18 | +- [Azure Subscription][azure_subscription] |
| 19 | +- An [Azure Speech resource](https://learn.microsoft.com/azure/ai-services/speech-service/overview#try-the-speech-service-for-free) or [Cognitive Services multi-service resource](https://learn.microsoft.com/azure/ai-services/multi-service-resource) |
| 20 | + |
| 21 | +### Adding the package to your product |
| 22 | + |
| 23 | +[//]: # ({x-version-update-start;com.azure:azure-ai-speech-transcription;current}) |
| 24 | +```xml |
| 25 | +<dependency> |
| 26 | + <groupId>com.azure</groupId> |
| 27 | + <artifactId>azure-ai-speech-transcription</artifactId> |
| 28 | + <version>1.0.0-beta.1</version> |
| 29 | +</dependency> |
| 30 | +``` |
| 31 | +[//]: # ({x-version-update-end}) |
| 32 | + |
| 33 | +#### Optional: For Entra ID Authentication |
| 34 | + |
| 35 | +If you plan to use Entra ID authentication (recommended for production), also add the `azure-identity` dependency: |
| 36 | + |
| 37 | +```xml |
| 38 | +<dependency> |
| 39 | + <groupId>com.azure</groupId> |
| 40 | + <artifactId>azure-identity</artifactId> |
| 41 | + <version>1.18.1</version> |
| 42 | +</dependency> |
| 43 | +``` |
| 44 | + |
| 45 | +### Authentication |
| 46 | + |
| 47 | +Azure Speech Transcription supports two authentication methods: |
| 48 | + |
| 49 | +#### Option 1: API Key Authentication (Subscription Key) |
| 50 | + |
| 51 | +You can find your Speech resource's API key in the [Azure Portal](https://portal.azure.com) or by using the Azure CLI: |
| 52 | + |
| 53 | +```bash |
| 54 | +az cognitiveservices account keys list --name <your-resource-name> --resource-group <your-resource-group> |
| 55 | +``` |
| 56 | + |
| 57 | +Once you have an API key, you can authenticate using `KeyCredential`: |
| 58 | + |
| 59 | +```java |
| 60 | +import com.azure.core.credential.KeyCredential; |
| 61 | + |
| 62 | +TranscriptionClient client = new TranscriptionClientBuilder() |
| 63 | + .endpoint("https://<your-resource-name>.cognitiveservices.azure.com/") |
| 64 | + .credential(new KeyCredential("<your-api-key>")) |
| 65 | + .buildClient(); |
| 66 | +``` |
| 67 | + |
| 68 | +#### Option 2: Entra ID OAuth2 Authentication (Recommended for Production) |
| 69 | + |
| 70 | +For production scenarios, it's recommended to use Entra ID authentication with managed identities or service principals. This provides better security and easier credential management. |
| 71 | + |
| 72 | +```java |
| 73 | +import com.azure.identity.DefaultAzureCredential; |
| 74 | +import com.azure.identity.DefaultAzureCredentialBuilder; |
| 75 | + |
| 76 | +// Use DefaultAzureCredential which works with managed identities, service principals, Azure CLI, etc. |
| 77 | +DefaultAzureCredential credential = new DefaultAzureCredentialBuilder().build(); |
| 78 | + |
| 79 | +TranscriptionClient client = new TranscriptionClientBuilder() |
| 80 | + .endpoint("https://<your-resource-name>.cognitiveservices.azure.com/") |
| 81 | + .credential(credential) |
| 82 | + .buildClient(); |
| 83 | +``` |
| 84 | + |
| 85 | +**Note:** To use Entra ID authentication, you need to: |
| 86 | +1. Add the `azure-identity` dependency to your project |
| 87 | +2. Assign the appropriate role (e.g., "Cognitive Services User") to your managed identity or service principal |
| 88 | +3. Ensure your Cognitive Services resource has Entra ID authentication enabled |
| 89 | + |
| 90 | +For more information on Entra ID authentication, see: |
| 91 | +- [Authenticate with Azure Identity](https://learn.microsoft.com/azure/developer/java/sdk/identity) |
| 92 | +- [Azure Cognitive Services authentication](https://learn.microsoft.com/azure/ai-services/authentication) |
| 93 | + |
| 94 | +## Key concepts |
| 95 | + |
| 96 | +### TranscriptionClient |
| 97 | + |
| 98 | +The `TranscriptionClient` is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio to text. |
| 99 | + |
| 100 | +### TranscriptionAsyncClient |
| 101 | + |
| 102 | +The `TranscriptionAsyncClient` provides asynchronous methods for transcribing audio, allowing non-blocking operations that return reactive types. |
| 103 | + |
| 104 | +### Audio Formats |
| 105 | + |
| 106 | +The service supports various audio formats including WAV, MP3, OGG, and more. Audio must be: |
| 107 | + |
| 108 | +- Shorter than 2 hours in duration |
| 109 | +- Smaller than 250 MB in size |
| 110 | + |
| 111 | +### Transcription Options |
| 112 | + |
| 113 | +You can customize transcription with options like: |
| 114 | + |
| 115 | +- **Profanity filtering**: Control how profanity is handled in transcriptions |
| 116 | +- **Speaker diarization**: Identify different speakers in multi-speaker audio |
| 117 | +- **Phrase lists**: Provide domain-specific phrases to improve accuracy |
| 118 | +- **Language detection**: Automatically detect the spoken language |
| 119 | +- **Enhanced mode**: Improve transcription quality with custom prompts, translation, and task-specific configurations |
| 120 | + |
| 121 | +## Examples |
| 122 | + |
| 123 | +### Transcribe an audio file |
| 124 | + |
| 125 | +```java com.azure.ai.speech.transcription.readme |
| 126 | +TranscriptionClient client = new TranscriptionClientBuilder() |
| 127 | + .endpoint("https://<your-resource-name>.cognitiveservices.azure.com/") |
| 128 | + .credential(new KeyCredential("<your-api-key>")) |
| 129 | + .buildClient(); |
| 130 | + |
| 131 | +try { |
| 132 | + // Read audio file |
| 133 | + byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav")); |
| 134 | + |
| 135 | + // Create audio file details |
| 136 | + AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData)); |
| 137 | + |
| 138 | + // Create transcription options |
| 139 | + TranscriptionOptions options = new TranscriptionOptions(audioFileDetails); |
| 140 | + |
| 141 | + // Transcribe audio |
| 142 | + TranscriptionResult result = client.transcribe(options); |
| 143 | + |
| 144 | + // Process results |
| 145 | + System.out.println("Duration: " + result.getDuration() + " ms"); |
| 146 | + result.getCombinedPhrases().forEach(phrase -> { |
| 147 | + System.out.println("Channel " + phrase.getChannel() + ": " + phrase.getText()); |
| 148 | + }); |
| 149 | +} catch (Exception e) { |
| 150 | + System.err.println("Error during transcription: " + e.getMessage()); |
| 151 | +} |
| 152 | +``` |
| 153 | + |
| 154 | +### Transcribe using audio URL |
| 155 | + |
| 156 | +You can transcribe audio directly from a URL without downloading the file first: |
| 157 | + |
| 158 | +```java readme-sample-transcribeWithAudioUrl |
| 159 | +TranscriptionClient client = new TranscriptionClientBuilder() |
| 160 | + .endpoint("https://<your-resource-name>.cognitiveservices.azure.com/") |
| 161 | + .credential(new KeyCredential("<your-api-key>")) |
| 162 | + .buildClient(); |
| 163 | + |
| 164 | +// Create transcription options with audio URL |
| 165 | +TranscriptionOptions options = new TranscriptionOptions("https://example.com/audio.wav"); |
| 166 | + |
| 167 | +// Transcribe audio |
| 168 | +TranscriptionResult result = client.transcribe(options); |
| 169 | + |
| 170 | +// Process results |
| 171 | +result.getCombinedPhrases().forEach(phrase -> { |
| 172 | + System.out.println(phrase.getText()); |
| 173 | +}); |
| 174 | +``` |
| 175 | + |
| 176 | +### Transcribe with multi-language support |
| 177 | + |
| 178 | +The service can automatically detect and transcribe multiple languages within the same audio file. |
| 179 | + |
| 180 | +```java com.azure.ai.speech.transcription.transcriptionoptions.multilanguage |
| 181 | +byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav")); |
| 182 | + |
| 183 | +AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData)); |
| 184 | + |
| 185 | +// Configure transcription WITHOUT specifying locales |
| 186 | +// This allows the service to auto-detect and transcribe multiple languages |
| 187 | +TranscriptionOptions options = new TranscriptionOptions(audioFileDetails); |
| 188 | + |
| 189 | +TranscriptionResult result = client.transcribe(options); |
| 190 | + |
| 191 | +result.getPhrases().forEach(phrase -> { |
| 192 | + System.out.println("Language: " + phrase.getLocale()); |
| 193 | + System.out.println("Text: " + phrase.getText()); |
| 194 | +}); |
| 195 | +``` |
| 196 | + |
| 197 | +### Transcribe with enhanced mode |
| 198 | + |
| 199 | +Enhanced mode provides advanced features to improve transcription accuracy with custom prompts. Enhanced mode is automatically enabled when you create an `EnhancedModeOptions` instance. |
| 200 | + |
| 201 | +```java com.azure.ai.speech.transcription.transcriptionoptions.enhancedmode |
| 202 | +byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav")); |
| 203 | + |
| 204 | +AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData)); |
| 205 | + |
| 206 | +// Enhanced mode is automatically enabled |
| 207 | +EnhancedModeOptions enhancedMode = new EnhancedModeOptions() |
| 208 | + .setTask("transcribe") |
| 209 | + .setPrompts(java.util.Arrays.asList("Output must be in lexical format.")); |
| 210 | + |
| 211 | +TranscriptionOptions options = new TranscriptionOptions(audioFileDetails) |
| 212 | + .setEnhancedModeOptions(enhancedMode); |
| 213 | + |
| 214 | +TranscriptionResult result = client.transcribe(options); |
| 215 | + |
| 216 | +System.out.println("Transcription: " + result.getCombinedPhrases().get(0).getText()); |
| 217 | +``` |
| 218 | + |
| 219 | +### Transcribe with phrase list |
| 220 | + |
| 221 | +You can use a phrase list to improve recognition accuracy for specific terms. |
| 222 | + |
| 223 | +```java com.azure.ai.speech.transcription.transcriptionoptions.phraselist |
| 224 | +byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav")); |
| 225 | + |
| 226 | +AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData)); |
| 227 | + |
| 228 | +PhraseListOptions phraseListOptions = new PhraseListOptions() |
| 229 | + .setPhrases(java.util.Arrays.asList("Azure", "Cognitive Services")) |
| 230 | + .setBiasingWeight(5.0); |
| 231 | + |
| 232 | +TranscriptionOptions options = new TranscriptionOptions(audioFileDetails) |
| 233 | + .setPhraseListOptions(phraseListOptions); |
| 234 | + |
| 235 | +TranscriptionResult result = client.transcribe(options); |
| 236 | + |
| 237 | +result.getCombinedPhrases().forEach(phrase -> { |
| 238 | + System.out.println(phrase.getText()); |
| 239 | +}); |
| 240 | +``` |
| 241 | + |
| 242 | +### Service API versions |
| 243 | + |
| 244 | +The client library targets the latest service API version by default. |
| 245 | +The service client builder accepts an optional service API version parameter to specify which API version to communicate. |
| 246 | + |
| 247 | +#### Select a service API version |
| 248 | + |
| 249 | +You have the flexibility to explicitly select a supported service API version when initializing a service client via the service client builder. |
| 250 | +This ensures that the client can communicate with services using the specified API version. |
| 251 | + |
| 252 | +When selecting an API version, it is important to verify that there are no breaking changes compared to the latest API version. |
| 253 | +If there are significant differences, API calls may fail due to incompatibility. |
| 254 | + |
| 255 | +Always ensure that the chosen API version is fully supported and operational for your specific use case and that it aligns with the service's versioning policy. |
| 256 | + |
| 257 | +## Troubleshooting |
| 258 | + |
| 259 | +### Enable client logging |
| 260 | + |
| 261 | +You can enable logging to debug issues with the client library. The Azure client libraries for Java use the SLF4J logging facade. You can configure logging by adding a logging dependency and configuration file. For more information, see the [logging documentation](https://learn.microsoft.com/azure/developer/java/sdk/logging-overview). |
| 262 | + |
| 263 | +### Common issues |
| 264 | + |
| 265 | +#### Authentication errors |
| 266 | + |
| 267 | +- Verify that your API key is correct |
| 268 | +- Ensure your endpoint URL matches your Azure resource region |
| 269 | + |
| 270 | +#### Audio format errors |
| 271 | + |
| 272 | +- Verify your audio file is in a supported format |
| 273 | +- Ensure the audio file size is under 250 MB and duration is under 2 hours |
| 274 | + |
| 275 | +### Getting help |
| 276 | + |
| 277 | +If you encounter issues: |
| 278 | + |
| 279 | +- Check the [troubleshooting guide](https://learn.microsoft.com/azure/ai-services/speech-service/troubleshooting) |
| 280 | +- Search for existing issues or create a new one on [GitHub](https://github.com/Azure/azure-sdk-for-java/issues) |
| 281 | +- Ask questions on [Stack Overflow](https://stackoverflow.com/questions/tagged/azure-java-sdk) with the `azure-java-sdk` tag |
| 282 | + |
| 283 | +## Next steps |
| 284 | + |
| 285 | +- Explore the [samples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/transcription/azure-ai-speech-transcription/src/samples) for more examples |
| 286 | +- Learn more about [Azure Speech Service](https://learn.microsoft.com/azure/ai-services/speech-service/) |
| 287 | +- Review the [API reference documentation][docs] for detailed information about classes and methods |
| 288 | + |
| 289 | +## Contributing |
| 290 | + |
| 291 | + |
| 292 | +For details on contributing to this repository, see the [contributing guide](https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md). |
| 293 | + |
| 294 | +1. Fork it |
| 295 | +1. Create your feature branch (`git checkout -b my-new-feature`) |
| 296 | +1. Commit your changes (`git commit -am 'Add some feature'`) |
| 297 | +1. Push to the branch (`git push origin my-new-feature`) |
| 298 | +1. Create new Pull Request |
| 299 | + |
| 300 | +<!-- LINKS --> |
| 301 | +[product_documentation]: https://learn.microsoft.com/azure/ai-services/speech-service/ |
| 302 | +[docs]: https://azure.github.io/azure-sdk-for-java/ |
| 303 | +[jdk]: https://learn.microsoft.com/azure/developer/java/fundamentals/ |
| 304 | +[azure_subscription]: https://azure.microsoft.com/free/ |
| 305 | + |
0 commit comments