Skip to content

Commit 82ed087

Browse files
update readme and modify enabled property for EnhancedModeOptions
1 parent edd2285 commit 82ed087

File tree

19 files changed

+158
-147
lines changed

19 files changed

+158
-147
lines changed

sdk/agrifood/azure-verticals-agrifood-farming/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,9 @@ Farm hierarchy is a collection of below entities.
103103

104104
```java readme-sample-createFarmHierarchy
105105
// Create Party
106-
JSONObject object = new JSONObject().appendField("name", "party1");
107-
BinaryData party = BinaryData.fromObject(object);
106+
Map<String, String> partyData = new HashMap<>();
107+
partyData.put("name", "party1");
108+
BinaryData party = BinaryData.fromObject(partyData);
108109
partiesClient.createOrUpdateWithResponse("contoso-party", party, null).block();
109110

110111
// Get Party

sdk/cognitiveservices/azure-ai-speech-transcription/README.md

Lines changed: 16 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Azure AI Speech Transcription client library for Java
22

3-
The Azure AI Speech Transcription client library provides a simple and efficient way to convert audio to text using Azure Cognitive Services. This library enables you to transcribe audio files with features like speaker diarization, profanity filtering, and phrase hints for improved accuracy.
3+
The Azure AI Speech Transcription client library provides a simple and efficient way to convert audio to text using Azure Cognitive Services. This library enables you to transcribe audio with features like speaker diarization, profanity filtering, and phrase hints for improved accuracy.
44

55
## Documentation
66

@@ -30,9 +30,9 @@ Various documentation is available to help you get started:
3030
```
3131
[//]: # ({x-version-update-end})
3232

33-
#### Optional: For Azure AD Authentication
33+
#### Optional: For Entra ID Authentication
3434

35-
If you plan to use Azure AD authentication (recommended for production), also add the `azure-identity` dependency:
35+
If you plan to use Entra ID authentication (recommended for production), also add the `azure-identity` dependency:
3636

3737
```xml
3838
<dependency>
@@ -65,11 +65,9 @@ TranscriptionClient client = new TranscriptionClientBuilder()
6565
.buildClient();
6666
```
6767

68-
#### Option 2: Azure AD OAuth2 Authentication (Recommended for Production)
68+
#### Option 2: Entra ID OAuth2 Authentication (Recommended for Production)
6969

70-
For production scenarios, it's recommended to use Azure Active Directory (Azure AD) authentication with managed identities or service principals. This provides better security and easier credential management.
71-
72-
The OAuth2 scope for Azure Cognitive Services is: `https://cognitiveservices.azure.com/.default`
70+
For production scenarios, it's recommended to use Entra ID authentication with managed identities or service principals. This provides better security and easier credential management.
7371

7472
```java
7573
import com.azure.identity.DefaultAzureCredential;
@@ -84,28 +82,28 @@ TranscriptionClient client = new TranscriptionClientBuilder()
8482
.buildClient();
8583
```
8684

87-
**Note:** To use Azure AD authentication, you need to:
85+
**Note:** To use Entra ID authentication, you need to:
8886
1. Add the `azure-identity` dependency to your project
8987
2. Assign the appropriate role (e.g., "Cognitive Services User") to your managed identity or service principal
90-
3. Ensure your Cognitive Services resource has Azure AD authentication enabled
88+
3. Ensure your Cognitive Services resource has Entra ID authentication enabled
9189

92-
For more information on Azure AD authentication, see:
90+
For more information on Entra ID authentication, see:
9391
- [Authenticate with Azure Identity](https://learn.microsoft.com/azure/developer/java/sdk/identity)
9492
- [Azure Cognitive Services authentication](https://learn.microsoft.com/azure/ai-services/authentication)
9593

9694
## Key concepts
9795

9896
### TranscriptionClient
9997

100-
The `TranscriptionClient` is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio files to text.
98+
The `TranscriptionClient` is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio to text.
10199

102100
### TranscriptionAsyncClient
103101

104102
The `TranscriptionAsyncClient` provides asynchronous methods for transcribing audio, allowing non-blocking operations that return reactive types.
105103

106104
### Audio Formats
107105

108-
The service supports various audio formats including WAV, MP3, OGG, and more. Audio files must be:
106+
The service supports various audio formats including WAV, MP3, OGG, and more. Audio must be:
109107

110108
- Shorter than 2 hours in duration
111109
- Smaller than 250 MB in size
@@ -135,8 +133,7 @@ try {
135133
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
136134

137135
// Create audio file details
138-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
139-
.setFilename("audio.wav");
136+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
140137

141138
// Create transcription options
142139
TranscriptionOptions options = new TranscriptionOptions(audioFileDetails);
@@ -176,45 +173,14 @@ result.getCombinedPhrases().forEach(phrase -> {
176173
});
177174
```
178175

179-
### Transcribe using AudioFileDetails constructor
180-
181-
You can also create `TranscriptionOptions` directly with `AudioFileDetails`:
182-
183-
```java readme-sample-transcribeWithAudioFileDetails
184-
TranscriptionClient client = new TranscriptionClientBuilder()
185-
.endpoint("https://<your-resource-name>.cognitiveservices.azure.com/")
186-
.credential(new KeyCredential("<your-api-key>"))
187-
.buildClient();
188-
189-
// Read audio file
190-
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
191-
192-
// Create audio file details
193-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
194-
.setFilename("audio.wav")
195-
.setContentType("audio/wav");
196-
197-
// Create transcription options with AudioFileDetails
198-
TranscriptionOptions options = new TranscriptionOptions(audioFileDetails);
199-
200-
// Transcribe audio
201-
TranscriptionResult result = client.transcribe(options);
202-
203-
// Process results
204-
result.getCombinedPhrases().forEach(phrase -> {
205-
System.out.println(phrase.getText());
206-
});
207-
```
208-
209176
### Transcribe with multi-language support
210177

211178
The service can automatically detect and transcribe multiple languages within the same audio file.
212179

213180
```java com.azure.ai.speech.transcription.transcriptionoptions.multilanguage
214181
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
215182

216-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
217-
.setFilename("audio.wav");
183+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
218184

219185
// Configure transcription WITHOUT specifying locales
220186
// This allows the service to auto-detect and transcribe multiple languages
@@ -235,10 +201,10 @@ Enhanced mode provides advanced features to improve transcription accuracy with
235201
```java com.azure.ai.speech.transcription.transcriptionoptions.enhancedmode
236202
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
237203

238-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
239-
.setFilename("audio.wav");
204+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
240205

241206
EnhancedModeOptions enhancedMode = new EnhancedModeOptions()
207+
.setEnabled(true)
242208
.setTask("transcribe")
243209
.setPrompts(java.util.Arrays.asList("Output must be in lexical format."));
244210

@@ -257,8 +223,7 @@ You can use a phrase list to improve recognition accuracy for specific terms.
257223
```java com.azure.ai.speech.transcription.transcriptionoptions.phraselist
258224
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));
259225

260-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
261-
.setFilename("audio.wav");
226+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
262227

263228
PhraseListOptions phraseListOptions = new PhraseListOptions()
264229
.setPhrases(java.util.Arrays.asList("Azure", "Cognitive Services"))
@@ -299,7 +264,7 @@ You can enable logging to debug issues with the client library. The Azure client
299264

300265
#### Authentication errors
301266

302-
- Verify that your API key is correct and has not expired
267+
- Verify that your API key is correct
303268
- Ensure your endpoint URL matches your Azure resource region
304269

305270
#### Audio format errors

sdk/cognitiveservices/azure-ai-speech-transcription/customization/src/main/java/SpeechTranscriptionCustomization.java

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,14 @@ public void customize(LibraryCustomization customization, Logger logger) {
5959
logger.info("Customizing TranscriptionDiarizationOptions.toJson()");
6060
customizeDiarizationOptionsToJson(models);
6161

62+
// Customize EnhancedModeOptions to add setter for enabled property
63+
logger.info("Customizing EnhancedModeOptions to add setEnabled() method");
64+
customizeEnhancedModeOptions(models);
65+
66+
// Customize AudioFileDetails.getFilename() to auto-generate filename from contentType if not set
67+
logger.info("Customizing AudioFileDetails.getFilename() to auto-generate filename");
68+
customizeAudioFileDetailsGetFilename(models);
69+
6270
// Add AudioFileDetails field and constructors to TranscriptionOptions, make setAudioUrl private, remove no-arg constructor
6371
logger
6472
.info("Customizing TranscriptionOptions to add AudioFileDetails support and remove no-arg constructor");
@@ -111,6 +119,58 @@ private void customizeDiarizationOptionsToJson(PackageCustomization packageCusto
111119
});
112120
}
113121

122+
/**
123+
* Customize AudioFileDetails.getFilename() to auto-generate a filename from contentType if not explicitly set.
124+
* This allows developers to omit setFilename() and have the SDK automatically provide a sensible default.
125+
*
126+
* @param packageCustomization the package customization
127+
*/
128+
private void customizeAudioFileDetailsGetFilename(PackageCustomization packageCustomization) {
129+
packageCustomization.getClass("AudioFileDetails").customizeAst(ast -> {
130+
ast.getClassByName("AudioFileDetails").ifPresent(clazz -> {
131+
clazz.getMethodsByName("getFilename").forEach(method -> {
132+
method.setBody(parseBlock(
133+
"{ if (this.filename != null && !this.filename.isEmpty()) { return this.filename; } "
134+
+ "if (\"audio/wav\".equalsIgnoreCase(this.contentType)) { return \"audio.wav\"; } "
135+
+ "if (\"audio/mpeg\".equalsIgnoreCase(this.contentType) || \"audio/mp3\".equalsIgnoreCase(this.contentType)) { return \"audio.mp3\"; } "
136+
+ "if (\"audio/ogg\".equalsIgnoreCase(this.contentType)) { return \"audio.ogg\"; } "
137+
+ "if (\"audio/flac\".equalsIgnoreCase(this.contentType)) { return \"audio.flac\"; } "
138+
+ "if (\"audio/webm\".equalsIgnoreCase(this.contentType)) { return \"audio.webm\"; } "
139+
+ "if (\"audio/opus\".equalsIgnoreCase(this.contentType)) { return \"audio.opus\"; } "
140+
+ "return \"audio\"; }"));
141+
method.setJavadocComment(
142+
new Javadoc(parseText("Get the filename property: The filename of the file. "
143+
+ "If not explicitly set, a filename will be auto-generated from the contentType."))
144+
.addBlockTag("return", "the filename value, or an auto-generated filename if not set."));
145+
});
146+
});
147+
});
148+
}
149+
150+
/**
151+
* Customize the EnhancedModeOptions to add a setter for the enabled property.
152+
* The enabled property must be explicitly set by users to enable enhanced mode features.
153+
*
154+
* @param packageCustomization the package customization
155+
*/
156+
private void customizeEnhancedModeOptions(PackageCustomization packageCustomization) {
157+
packageCustomization.getClass("EnhancedModeOptions").customizeAst(ast -> {
158+
ast.getClassByName("EnhancedModeOptions").ifPresent(clazz -> {
159+
// Add setEnabled method following the fluent pattern
160+
com.github.javaparser.ast.body.MethodDeclaration setEnabledMethod
161+
= clazz.addMethod("setEnabled", Modifier.Keyword.PUBLIC);
162+
setEnabledMethod.addParameter("Boolean", "enabled");
163+
setEnabledMethod.setType("EnhancedModeOptions");
164+
setEnabledMethod.setBody(parseBlock("{ this.enabled = enabled; return this; }"));
165+
setEnabledMethod.setJavadocComment(
166+
new Javadoc(parseText(
167+
"Set the enabled property: Enable enhanced mode for transcription. Must be set to true to use enhanced mode features."))
168+
.addBlockTag("param", "enabled the enabled value to set.")
169+
.addBlockTag("return", "the EnhancedModeOptions object itself."));
170+
});
171+
});
172+
}
173+
114174
/**
115175
* Customize TranscriptionOptions to:
116176
* 1. Add AudioFileDetails field (final)

sdk/cognitiveservices/azure-ai-speech-transcription/src/main/java/com/azure/ai/speech/transcription/TranscriptionAsyncClient.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ public final class TranscriptionAsyncClient {
4848
/**
4949
* Transcribes the provided audio stream.
5050
* <p><strong>Response Body Schema</strong></p>
51-
*
51+
*
5252
* <pre>
5353
* {@code
5454
* {

sdk/cognitiveservices/azure-ai-speech-transcription/src/main/java/com/azure/ai/speech/transcription/TranscriptionClient.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ public final class TranscriptionClient {
4646
/**
4747
* Transcribes the provided audio stream.
4848
* <p><strong>Response Body Schema</strong></p>
49-
*
49+
*
5050
* <pre>
5151
* {@code
5252
* {

sdk/cognitiveservices/azure-ai-speech-transcription/src/main/java/com/azure/ai/speech/transcription/models/AudioFileDetails.java

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,35 @@ public BinaryData getContent() {
5252
}
5353

5454
/**
55-
* Get the filename property: The filename of the file.
55+
* Get the filename property: The filename of the file. If not explicitly set, a filename will be auto-generated
56+
* from the contentType.
5657
*
57-
* @return the filename value.
58+
* @return the filename value, or an auto-generated filename if not set.
5859
*/
5960
@Generated
6061
public String getFilename() {
61-
return this.filename;
62+
if (this.filename != null && !this.filename.isEmpty()) {
63+
return this.filename;
64+
}
65+
if ("audio/wav".equalsIgnoreCase(this.contentType)) {
66+
return "audio.wav";
67+
}
68+
if ("audio/mpeg".equalsIgnoreCase(this.contentType) || "audio/mp3".equalsIgnoreCase(this.contentType)) {
69+
return "audio.mp3";
70+
}
71+
if ("audio/ogg".equalsIgnoreCase(this.contentType)) {
72+
return "audio.ogg";
73+
}
74+
if ("audio/flac".equalsIgnoreCase(this.contentType)) {
75+
return "audio.flac";
76+
}
77+
if ("audio/webm".equalsIgnoreCase(this.contentType)) {
78+
return "audio.webm";
79+
}
80+
if ("audio/opus".equalsIgnoreCase(this.contentType)) {
81+
return "audio.opus";
82+
}
83+
return "audio";
6284
}
6385

6486
/**

sdk/cognitiveservices/azure-ai-speech-transcription/src/main/java/com/azure/ai/speech/transcription/models/EnhancedModeOptions.java

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,4 +171,16 @@ public static EnhancedModeOptions fromJson(JsonReader jsonReader) throws IOExcep
171171
return deserializedEnhancedModeOptions;
172172
});
173173
}
174+
175+
/**
176+
* Set the enabled property: Enable enhanced mode for transcription. Must be set to true to use enhanced mode
177+
* features.
178+
*
179+
* @param enabled the enabled value to set.
180+
* @return the EnhancedModeOptions object itself.
181+
*/
182+
public EnhancedModeOptions setEnabled(Boolean enabled) {
183+
this.enabled = enabled;
184+
return this;
185+
}
174186
}

sdk/cognitiveservices/azure-ai-speech-transcription/src/samples/java/com/azure/ai/speech/transcription/EnhancedModeSample.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,12 +117,12 @@ private static TranscriptionResult transcribeWithFullEnhancedMode(
117117
String filename
118118
) throws Exception {
119119
// Create audio file details
120-
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData))
121-
.setFilename(filename);
120+
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));
122121

123122
// Configure comprehensive LLM-enhanced mode settings
124123
// Always include lexical format prompt for best results
125124
EnhancedModeOptions enhancedMode = new EnhancedModeOptions()
125+
.setEnabled(true)
126126
.setTask("transcribe")
127127
.setPrompts(Arrays.asList(
128128
"Output must be in lexical format."

sdk/cognitiveservices/azure-ai-speech-transcription/src/samples/java/com/azure/ai/speech/transcription/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ To run these samples, you need:
1010
2. **Azure AI Speech Service Resource**: Create one in the [Azure Portal](https://portal.azure.com)
1111
3. **Authentication**: Choose one of the following authentication methods:
1212

13-
### Option 1: Azure AD Authentication (Recommended for Production)
14-
15-
Set the endpoint and configure Azure AD credentials:
13+
### Option 1: Entra ID Authentication (Recommended for Production)
14+
15+
Set the endpoint and configure Entra ID credentials:
1616

1717
```bash
1818
set SPEECH_ENDPOINT=https://your-resource-name.cognitiveservices.azure.com/
@@ -32,7 +32,7 @@ To run these samples, you need:
3232
--scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<speech-resource-name>
3333
```
3434

35-
**Required dependency** for Azure AD authentication:
35+
**Required dependency** for Entra ID authentication:
3636

3737
```xml
3838
<dependency>
@@ -64,7 +64,7 @@ To run these samples, you need:
6464
--scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<speech-resource-name>
6565
```
6666

67-
**Required dependency** for Azure AD authentication:
67+
**Required dependency** for Entra ID authentication:
6868

6969
```xml
7070
<dependency>
@@ -80,10 +80,10 @@ To run these samples, you need:
8080

8181
All samples in this directory support **both authentication methods**:
8282

83-
- **Azure AD (TokenCredential)**: Uses `DefaultAzureCredential` from azure-identity
83+
- **Entra ID (TokenCredential)**: Uses `DefaultAzureCredential` from azure-identity
8484
- **API Key (KeyCredential)**: Uses the `SPEECH_API_KEY` environment variable
8585

86-
The samples will automatically detect which authentication method to use based on the environment variables you've set. If `SPEECH_API_KEY` is set, it will use API Key authentication; otherwise, it will attempt Azure AD authentication.
86+
The samples will automatically detect which authentication method to use based on the environment variables you've set. If `SPEECH_API_KEY` is set, it will use API Key authentication; otherwise, it will attempt Entra ID authentication.
8787

8888
## Available Samples
8989

0 commit comments

Comments
 (0)