Skip to content

Commit 18de5f6

Browse files
Merge branch 'wangamber/transcription' of https://github.com/amber-yujueWang/azure-sdk-for-java into wangamber/transcription
2 parents 89c75ab + 82ed087 commit 18de5f6

File tree

26 files changed

+365
-235
lines changed

26 files changed

+365
-235
lines changed

.vscode/cspell.json

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -223,17 +223,13 @@
223223
"autodetection",
224224
"awps",
225225
"azconfig",
226-
"azuread",
227226
"azsdk",
228227
"azsynapse",
229228
"azurecr",
230229
"azurestackhci",
231230
"backoff",
232231
"boringssl",
233-
"BYOS",
234232
"BYOD",
235-
"dexec",
236-
"diarization",
237233
"Dgpg",
238234
"Dskip",
239235
"mvnw",

eng/common/pipelines/templates/jobs/prepare-pipelines.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ jobs:
8282
$generateUnifiedWeekly = 'false'
8383
8484
$testServiceConnections = '"Azure" "azure-sdk-tests" "azure-sdk-tests-preview" "azure-sdk-tests-public" "Azure SDK Test Resources - LiveTestSecrets"'
85-
$internalServiceConnections = '"Azure" "Azure SDK Artifacts" "Azure SDK Engineering System" "opensource-api-connection" "AzureSDKEngKeyVault Secrets" "Azure SDK PME Managed Identity"'
85+
$internalServiceConnections = '"Azure" "Azure SDK Artifacts" "Azure SDK Engineering System" "opensource-api-connection" "AzureSDKEngKeyVault Secrets" "Azure SDK PME Managed Identity" "APIView prod deployment"'
8686
8787
# Map the language to the appropriate variable groups
8888
switch ($lang)

sdk/agrifood/azure-verticals-agrifood-farming/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,9 @@ Farm hierarchy is a collection of below entities.
103103

104104
```java readme-sample-createFarmHierarchy
105105
// Create Party
106-
JSONObject object = new JSONObject().appendField("name", "party1");
107-
BinaryData party = BinaryData.fromObject(object);
106+
Map<String, String> partyData = new HashMap<>();
107+
partyData.put("name", "party1");
108+
BinaryData party = BinaryData.fromObject(partyData);
108109
partiesClient.createOrUpdateWithResponse("contoso-party", party, null).block();
109110

110111
// Get Party

sdk/cognitiveservices/azure-ai-speech-transcription/CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Release History
22

3-
## 1.0.0-beta.1 (2025-11-14)
3+
## 1.0.0-beta.1 (2025-12-12)
44

55
### Features Added
66

sdk/cognitiveservices/azure-ai-speech-transcription/README.md

Lines changed: 16 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Azure AI Speech Transcription client library for Java
22

3-
The Azure AI Speech Transcription client library provides a simple and efficient way to convert audio to text using Azure Cognitive Services. This library enables you to transcribe audio files with features like speaker diarization, profanity filtering, and phrase hints for improved accuracy.
3+
The Azure AI Speech Transcription client library provides a simple and efficient way to convert audio to text using Azure Cognitive Services. This library enables you to transcribe audio with features like speaker diarization, profanity filtering, and phrase hints for improved accuracy.
44

55
## Documentation
66

@@ -30,9 +30,9 @@ Various documentation is available to help you get started:
3030
```
3131
[//]: # ({x-version-update-end})
3232

33-
#### Optional: For Azure AD Authentication
33+
#### Optional: For Entra ID Authentication
3434

35-
If you plan to use Azure AD authentication (recommended for production), also add the `azure-identity` dependency:
35+
If you plan to use Entra ID authentication (recommended for production), also add the `azure-identity` dependency:
3636

3737
```xml
3838
<dependency>
@@ -65,11 +65,9 @@ TranscriptionClient client = new TranscriptionClientBuilder()
6565
.buildClient();
6666
```
6767

68-
#### Option 2: Azure AD OAuth2 Authentication (Recommended for Production)
68+
#### Option 2: Entra ID OAuth2 Authentication (Recommended for Production)
6969

70-
For production scenarios, it's recommended to use Azure Active Directory (Azure AD) authentication with managed identities or service principals. This provides better security and easier credential management.
71-
72-
The OAuth2 scope for Azure Cognitive Services is: `https://cognitiveservices.azure.com/.default`
70+
For production scenarios, it's recommended to use Entra ID authentication with managed identities or service principals. This provides better security and easier credential management.
7371

7472
```java
7573
import com.azure.identity.DefaultAzureCredential;
@@ -84,28 +82,28 @@ TranscriptionClient client = new TranscriptionClientBuilder()
8482
.buildClient();
8583
```
8684

87-
**Note:** To use Azure AD authentication, you need to:
85+
**Note:** To use Entra ID authentication, you need to:
8886
1. Add the `azure-identity` dependency to your project
8987
2. Assign the appropriate role (e.g., "Cognitive Services User") to your managed identity or service principal
90-
3. Ensure your Cognitive Services resource has Azure AD authentication enabled
88+
3. Ensure your Cognitive Services resource has Entra ID authentication enabled
9189

92-
For more information on Azure AD authentication, see:
90+
For more information on Entra ID authentication, see:
9391
- [Authenticate with Azure Identity](https://learn.microsoft.com/azure/developer/java/sdk/identity)
9492
- [Azure Cognitive Services authentication](https://learn.microsoft.com/azure/ai-services/authentication)
9593

9694
## Key concepts
9795

9896
### TranscriptionClient
9997

100-
The `TranscriptionClient` is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio files to text.
98+
The `TranscriptionClient` is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio to text.
10199

102100
### TranscriptionAsyncClient
103101

104102
The `TranscriptionAsyncClient` provides asynchronous methods for transcribing audio, allowing non-blocking operations that return reactive types.
105103

106104
### Audio Formats
107105

108-
The service supports various audio formats including WAV, MP3, OGG, and more. Audio files must be:
106+
The service supports various audio formats including WAV, MP3, OGG, and more. Audio must be:
109107

110108
- Shorter than 2 hours in duration
111109
- Smaller than 250 MB in size
@@ -135,8 +133,7 @@ try {
135133
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
136134

137135
// Create audio file details
138-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
139-
.setFilename("audio.wav");
136+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
140137

141138
// Create transcription options
142139
TranscriptionOptions options = new TranscriptionOptions(audioFileDetails);
@@ -176,45 +173,14 @@ result.getCombinedPhrases().forEach(phrase -> {
176173
});
177174
```
178175

179-
### Transcribe using AudioFileDetails constructor
180-
181-
You can also create `TranscriptionOptions` directly with `AudioFileDetails`:
182-
183-
```java readme-sample-transcribeWithAudioFileDetails
184-
TranscriptionClient client = new TranscriptionClientBuilder()
185-
.endpoint("https://<your-resource-name>.cognitiveservices.azure.com/")
186-
.credential(new KeyCredential("<your-api-key>"))
187-
.buildClient();
188-
189-
// Read audio file
190-
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
191-
192-
// Create audio file details
193-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
194-
.setFilename("audio.wav")
195-
.setContentType("audio/wav");
196-
197-
// Create transcription options with AudioFileDetails
198-
TranscriptionOptions options = new TranscriptionOptions(audioFileDetails);
199-
200-
// Transcribe audio
201-
TranscriptionResult result = client.transcribe(options);
202-
203-
// Process results
204-
result.getCombinedPhrases().forEach(phrase -> {
205-
System.out.println(phrase.getText());
206-
});
207-
```
208-
209176
### Transcribe with multi-language support
210177

211178
The service can automatically detect and transcribe multiple languages within the same audio file.
212179

213180
```java com.azure.ai.speech.transcription.transcriptionoptions.multilanguage
214181
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
215182

216-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
217-
.setFilename("audio.wav");
183+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
218184

219185
// Configure transcription WITHOUT specifying locales
220186
// This allows the service to auto-detect and transcribe multiple languages
@@ -235,10 +201,10 @@ Enhanced mode provides advanced features to improve transcription accuracy with
235201
```java com.azure.ai.speech.transcription.transcriptionoptions.enhancedmode
236202
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
237203

238-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
239-
.setFilename("audio.wav");
204+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
240205

241206
EnhancedModeOptions enhancedMode = new EnhancedModeOptions()
207+
.setEnabled(true)
242208
.setTask("transcribe")
243209
.setPrompts(java.util.Arrays.asList("Output must be in lexical format."));
244210

@@ -257,8 +223,7 @@ You can use a phrase list to improve recognition accuracy for specific terms.
257223
```java com.azure.ai.speech.transcription.transcriptionoptions.phraselist
258224
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
259225

260-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
261-
.setFilename("audio.wav");
226+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
262227

263228
PhraseListOptions phraseListOptions = new PhraseListOptions()
264229
.setPhrases(java.util.Arrays.asList("Azure", "Cognitive Services"))
@@ -299,7 +264,7 @@ You can enable logging to debug issues with the client library. The Azure client
299264

300265
#### Authentication errors
301266

302-
- Verify that your API key is correct and has not expired
267+
- Verify that your API key is correct
303268
- Ensure your endpoint URL matches your Azure resource region
304269

305270
#### Audio format errors
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{"AssetsRepo":"Azure/azure-sdk-assets","AssetsRepoPrefixPath":"java","TagPrefix":"java/cognitiveservices/azure-ai-speech-transcription","Tag": "java/cognitiveservices/azure-ai-speech-transcription_5be01e978d"}
1+
{"AssetsRepo":"Azure/azure-sdk-assets","AssetsRepoPrefixPath":"java","TagPrefix":"java/cognitiveservices/azure-ai-speech-transcription","Tag": "java/cognitiveservices/azure-ai-speech-transcription_3b50d0dce8"}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"version": "0.2",
3+
"language": "en",
4+
"words": [
5+
"azuread",
6+
"BYOD",
7+
"BYOS",
8+
"dexec",
9+
"diarization",
10+
"doméstica",
11+
"empleada",
12+
"habitación",
13+
"misrecognized",
14+
"Mundo"
15+
]
16+
}

sdk/cognitiveservices/azure-ai-speech-transcription/customization/src/main/java/SpeechTranscriptionCustomization.java

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,14 @@ public void customize(LibraryCustomization customization, Logger logger) {
5959
logger.info("Customizing TranscriptionDiarizationOptions.toJson()");
6060
customizeDiarizationOptionsToJson(models);
6161

62+
// Customize EnhancedModeOptions to add setter for enabled property
63+
logger.info("Customizing EnhancedModeOptions to add setEnabled() method");
64+
customizeEnhancedModeOptions(models);
65+
66+
// Customize AudioFileDetails.getFilename() to auto-generate filename from contentType if not set
67+
logger.info("Customizing AudioFileDetails.getFilename() to auto-generate filename");
68+
customizeAudioFileDetailsGetFilename(models);
69+
6270
// Add AudioFileDetails field and constructors to TranscriptionOptions, make setAudioUrl private, remove no-arg constructor
6371
logger
6472
.info("Customizing TranscriptionOptions to add AudioFileDetails support and remove no-arg constructor");
@@ -111,6 +119,58 @@ private void customizeDiarizationOptionsToJson(PackageCustomization packageCusto
111119
});
112120
}
113121

122+
/**
123+
* Customize AudioFileDetails.getFilename() to auto-generate a filename from contentType if not explicitly set.
124+
* This allows developers to omit setFilename() and have the SDK automatically provide a sensible default.
125+
*
126+
* @param packageCustomization the package customization
127+
*/
128+
private void customizeAudioFileDetailsGetFilename(PackageCustomization packageCustomization) {
129+
packageCustomization.getClass("AudioFileDetails").customizeAst(ast -> {
130+
ast.getClassByName("AudioFileDetails").ifPresent(clazz -> {
131+
clazz.getMethodsByName("getFilename").forEach(method -> {
132+
method.setBody(parseBlock(
133+
"{ if (this.filename != null && !this.filename.isEmpty()) { return this.filename; } "
134+
+ "if (\"audio/wav\".equalsIgnoreCase(this.contentType)) { return \"audio.wav\"; } "
135+
+ "if (\"audio/mpeg\".equalsIgnoreCase(this.contentType) || \"audio/mp3\".equalsIgnoreCase(this.contentType)) { return \"audio.mp3\"; } "
136+
+ "if (\"audio/ogg\".equalsIgnoreCase(this.contentType)) { return \"audio.ogg\"; } "
137+
+ "if (\"audio/flac\".equalsIgnoreCase(this.contentType)) { return \"audio.flac\"; } "
138+
+ "if (\"audio/webm\".equalsIgnoreCase(this.contentType)) { return \"audio.webm\"; } "
139+
+ "if (\"audio/opus\".equalsIgnoreCase(this.contentType)) { return \"audio.opus\"; } "
140+
+ "return \"audio\"; }"));
141+
method.setJavadocComment(
142+
new Javadoc(parseText("Get the filename property: The filename of the file. "
143+
+ "If not explicitly set, a filename will be auto-generated from the contentType."))
144+
.addBlockTag("return", "the filename value, or an auto-generated filename if not set."));
145+
});
146+
});
147+
});
148+
}
149+
150+
/**
151+
* Customize the EnhancedModeOptions to add a setter for the enabled property.
152+
* The enabled property must be explicitly set by users to enable enhanced mode features.
153+
*
154+
* @param packageCustomization the package customization
155+
*/
156+
private void customizeEnhancedModeOptions(PackageCustomization packageCustomization) {
157+
packageCustomization.getClass("EnhancedModeOptions").customizeAst(ast -> {
158+
ast.getClassByName("EnhancedModeOptions").ifPresent(clazz -> {
159+
// Add setEnabled method following the fluent pattern
160+
com.github.javaparser.ast.body.MethodDeclaration setEnabledMethod
161+
= clazz.addMethod("setEnabled", Modifier.Keyword.PUBLIC);
162+
setEnabledMethod.addParameter("Boolean", "enabled");
163+
setEnabledMethod.setType("EnhancedModeOptions");
164+
setEnabledMethod.setBody(parseBlock("{ this.enabled = enabled; return this; }"));
165+
setEnabledMethod.setJavadocComment(
166+
new Javadoc(parseText(
167+
"Set the enabled property: Enable enhanced mode for transcription. Must be set to true to use enhanced mode features."))
168+
.addBlockTag("param", "enabled the enabled value to set.")
169+
.addBlockTag("return", "the EnhancedModeOptions object itself."));
170+
});
171+
});
172+
}
173+
114174
/**
115175
* Customize TranscriptionOptions to:
116176
* 1. Add AudioFileDetails field (final)

sdk/cognitiveservices/azure-ai-speech-transcription/src/main/java/com/azure/ai/speech/transcription/TranscriptionAsyncClient.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ public final class TranscriptionAsyncClient {
4848
/**
4949
* Transcribes the provided audio stream.
5050
* <p><strong>Response Body Schema</strong></p>
51-
*
51+
*
5252
* <pre>
5353
* {@code
5454
* {

sdk/cognitiveservices/azure-ai-speech-transcription/src/main/java/com/azure/ai/speech/transcription/TranscriptionClient.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ public final class TranscriptionClient {
4646
/**
4747
* Transcribes the provided audio stream.
4848
* <p><strong>Response Body Schema</strong></p>
49-
*
49+
*
5050
* <pre>
5151
* {@code
5252
* {

0 commit comments

Comments
 (0)