Skip to content

Commit aced742

Browse files
authored
Merge pull request #263220 from MicrosoftDocs/main
Publish to live, Monday 4 AM PST, 1/15
2 parents 3d1c9b8 + 2b2b199 commit aced742

29 files changed

+581
-807
lines changed

articles/ai-services/openai/how-to/integrate-synapseml.md

Lines changed: 76 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ This tutorial shows how to apply large language models at a distributed scale by
3535
- To install SynapseML for your Apache Spark cluster, see [Install SynapseML](#install-synapseml).
3636

3737
> [!NOTE]
38-
> This article is designed to work with the [Azure OpenAI Service legacy models](/azure/ai-services/openai/concepts/legacy-models) like `Text-Davinci-003`, which support prompt-based completions. Newer models like the current `GPT-3.5 Turbo` and `GPT-4` model series are designed to work with the new chat completion API that expects a specially formatted array of messages as input.
38+
> The `OpenAICompletion()` transformer is designed to work with the [Azure OpenAI Service legacy models](/azure/ai-services/openai/concepts/legacy-models) like `Text-Davinci-003`, which supports prompt-based completions. Newer models like the current `GPT-3.5 Turbo` and `GPT-4` model series are designed to work with the new chat completion API that expects a specially formatted array of messages as input. If you working with embeddings or chat completion models, please check the [Chat Completion](#chat-completion) and [Generating Text Embeddings](#generating-text-embeddings) sections bellow.
3939
>
40-
> The Azure OpenAI SynapseML integration supports the latest models via the [OpenAIChatCompletion()](https://github.com/microsoft/SynapseML/blob/0836e40efd9c48424e91aa10c8aa3fbf0de39f31/cognitive/src/main/scala/com/microsoft/azure/synapse/ml/cognitive/openai/OpenAIChatCompletion.scala#L24) transformer, which isn't demonstrated in this article. After the [release of the GPT-3.5 Turbo Instruct model](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/announcing-updates-to-azure-openai-service-models/ba-p/3866757), the newer model will be the preferred model to use with this article.
40+
> The Azure OpenAI SynapseML integration supports the latest models via the [OpenAIChatCompletion()](https://github.com/microsoft/SynapseML/blob/0836e40efd9c48424e91aa10c8aa3fbf0de39f31/cognitive/src/main/scala/com/microsoft/azure/synapse/ml/cognitive/openai/OpenAIChatCompletion.scala#L24) transformer.
4141
4242
We recommend that you [create an Azure Synapse workspace](../../../synapse-analytics/get-started-create-workspace.md). However, you can also use Azure Databricks, Azure HDInsight, Spark on Kubernetes, or the Python environment with the `pyspark` package.
4343

@@ -187,15 +187,87 @@ The following image shows example output with completions in Azure Synapse Analy
187187

188188
Here are some other use cases for working with Azure OpenAI Service and large datasets.
189189

190-
### Improve throughput with request batching
190+
### Generating Text Embeddings
191+
192+
In addition to completing text, we can also embed text for use in downstream algorithms or vector retrieval architectures. Creating embeddings allows you to search and retrieve documents from large collections and can be used when prompt engineering isn't sufficient for the task. For more information on using [OpenAIEmbedding](https://mmlspark.blob.core.windows.net/docs/0.11.1/pyspark/_modules/synapse/ml/cognitive/openai/OpenAIEmbedding.html), see our [embedding guide](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/OpenAI/Quickstart%20-%20OpenAI%20Embedding/).
193+
194+
from synapse.ml.services.openai import OpenAIEmbedding
195+
196+
```python
197+
embedding = (
198+
OpenAIEmbedding()
199+
.setSubscriptionKey(key)
200+
.setDeploymentName(deployment_name_embeddings)
201+
.setCustomServiceName(service_name)
202+
.setTextCol("prompt")
203+
.setErrorCol("error")
204+
.setOutputCol("embeddings")
205+
)
206+
207+
display(embedding.transform(df))
208+
```
209+
210+
### Chat Completion
211+
Models such as ChatGPT and GPT-4 are capable of understanding chats instead of single prompts. The [OpenAIChatCompletion](https://mmlspark.blob.core.windows.net/docs/0.11.1/pyspark/_modules/synapse/ml/cognitive/openai/OpenAIChatCompletion.html) transformer exposes this functionality at scale.
212+
213+
```python
214+
from synapse.ml.services.openai import OpenAIChatCompletion
215+
from pyspark.sql import Row
216+
from pyspark.sql.types import *
217+
218+
219+
def make_message(role, content):
220+
return Row(role=role, content=content, name=role)
221+
222+
223+
chat_df = spark.createDataFrame(
224+
[
225+
(
226+
[
227+
make_message(
228+
"system", "You are an AI chatbot with red as your favorite color"
229+
),
230+
make_message("user", "Whats your favorite color"),
231+
],
232+
),
233+
(
234+
[
235+
make_message("system", "You are very excited"),
236+
make_message("user", "How are you today"),
237+
],
238+
),
239+
]
240+
).toDF("messages")
241+
242+
chat_completion = (
243+
OpenAIChatCompletion()
244+
.setSubscriptionKey(key)
245+
.setDeploymentName(deployment_name)
246+
.setCustomServiceName(service_name)
247+
.setMessagesCol("messages")
248+
.setErrorCol("error")
249+
.setOutputCol("chat_completions")
250+
)
251+
252+
display(
253+
chat_completion.transform(chat_df).select(
254+
"messages", "chat_completions.choices.message.content"
255+
)
256+
)
257+
```
258+
259+
### Improve throughput with request batching from OpenAICompletion
191260

192261
You can use Azure OpenAI Service with large datasets to improve throughput with request batching. In the previous example, you make several requests to the service, one for each prompt. To complete multiple prompts in a single request, you can use batch mode.
193262

194-
In the `OpenAICompletion` object definition, you specify the `"batchPrompt"` value to configure the dataframe to use a **batchPrompt** column. Create the dataframe with a list of prompts for each row.
263+
In the [OpenAItCompletion](https://mmlspark.blob.core.windows.net/docs/0.11.1/pyspark/_modules/synapse/ml/cognitive/openai/OpenAICompletion.html) object definition, you specify the `"batchPrompt"` value to configure the dataframe to use a **batchPrompt** column. Create the dataframe with a list of prompts for each row.
195264

196265
> [!NOTE]
197266
> There's currently a limit of 20 prompts in a single request and a limit of 2048 tokens, or approximately 1500 words.
198267
268+
> [!NOTE]
269+
> Currently, request batching is not supported by the `OpenAIChatCompletion()` transformer.
270+
199271
```python
200272
batch_df = spark.createDataFrame(
201273
[
@@ -227,9 +299,6 @@ completed_batch_df = batch_completion.transform(batch_df).cache()
227299
display(completed_batch_df)
228300
```
229301

230-
> [!NOTE]
231-
> There's currently a limit of 20 prompts in a single request and a limit of 2048 tokens, or approximately 1500 words.
232-
233302
### Use an automatic mini-batcher
234303

235304
You can use Azure OpenAI Service with large datasets to transpose the data format. If your data is in column format, you can transpose it to row format by using the SynapseML `FixedMiniBatcherTransformer` object.

articles/ai-services/speech-service/includes/language-support/pronunciation-assessment.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ ms.author: eur
1111
|Arabic (Saudi Arabia)|`ar-SA` |
1212
|Chinese (Cantonese, Traditional)|`zh-HK`<sup>1</sup>|
1313
|Chinese (Mandarin, Simplified)|`zh-CN`|
14+
|Dutch (Netherlands)|`nl-NL`<sup>1</sup>|
1415
|English (Australia)|`en-AU`|
1516
|English (Canada)|`en-CA` |
1617
|English (India)|`en-IN` |

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -105,31 +105,7 @@ Speech to text supports two new locales as shown in the following table. Refer t
105105

106106
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 3 additional languages generally available in English (Canada), English (India), and French (Canada), with 3 additional languages available in preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
107107

108-
| Language | Locale (BCP-47) |
109-
|--|--|
110-
|Arabic (Saudi Arabia)|`ar-SA`<sup>1</sup> |
111-
|Chinese (Mandarin, Simplified)|`zh-CN`|
112-
|English (Australia)|`en-AU`|
113-
|English (Canada)|`en-CA` |
114-
|English (India)|`en-IN` |
115-
|English (United Kingdom)|`en-GB`|
116-
|English (United States)|`en-US`|
117-
|French (Canada)|`fr-CA`|
118-
|French (France)|`fr-FR`|
119-
|German (Germany)|`de-DE`|
120-
|Italian (Italy)|`it-IT`<sup>1</sup>|
121-
|Japanese (Japan)|`ja-JP`|
122-
|Korean (Korea)|`ko-KR`<sup>1</sup>|
123-
|Malay (Malaysia)|`ms-MY`<sup>1</sup>|
124-
|Norwegian Bokmål (Norway)|`nb-NO`<sup>1</sup>|
125-
|Portuguese (Brazil)|`pt-BR`<sup>1</sup>|
126-
|Russian (Russia)|`ru-RU`<sup>1</sup>|
127-
|Spanish (Mexico)|`es-MX` |
128-
|Spanish (Spain)|`es-ES` |
129-
|Tamil (India)|`ta-IN`<sup>1</sup> |
130-
|Vietnamese (Vietnam)|`vi-VN`<sup>1</sup> |
131-
132-
<sup>1</sup> The language is in public preview for pronunciation assessment.
108+
133109

134110
### May 2023 release
135111

articles/ai-services/speech-service/language-support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ With the cross-lingual feature, you can transfer your custom neural voice model
111111

112112
# [Pronunciation assessment](#tab/pronunciation-assessment)
113113

114-
The table in this section summarizes the 24 locales supported for pronunciation assessment, and each language is available on all [Speech to text regions](regions.md#speech-service). Latest update extends support from English to 23 additional languages and quality enhancements to existing features, including accuracy, fluency and miscue assessment. You should specify the language that you're learning or practicing improving pronunciation. The default language is set as `en-US`. If you know your target learning language, [set the locale](how-to-pronunciation-assessment.md#get-pronunciation-assessment-results) accordingly. For example, if you're learning British English, you should specify the language as `en-GB`. If you're teaching a broader language, such as Spanish, and are uncertain about which locale to select, you can run various accent models (`es-ES`, `es-MX`) to determine the one that achieves the highest score to suit your specific scenario.
114+
The table in this section summarizes the 25 locales supported for pronunciation assessment, and each language is available on all [Speech to text regions](regions.md#speech-service). Latest update extends support from English to 24 additional languages and quality enhancements to existing features, including accuracy, fluency and miscue assessment. You should specify the language that you're learning or practicing improving pronunciation. The default language is set as `en-US`. If you know your target learning language, [set the locale](how-to-pronunciation-assessment.md#get-pronunciation-assessment-results) accordingly. For example, if you're learning British English, you should specify the language as `en-GB`. If you're teaching a broader language, such as Spanish, and are uncertain about which locale to select, you can run various accent models (`es-ES`, `es-MX`) to determine the one that achieves the highest score to suit your specific scenario.
115115

116116
[!INCLUDE [Language support include](includes/language-support/pronunciation-assessment.md)]
117117

articles/backup/backup-support-matrix-iaas.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -191,8 +191,8 @@ Adding a disk to a protected VM | Supported.
191191
Resizing a disk on a protected VM | Supported.
192192
Shared storage| Backing up VMs by using Cluster Shared Volumes (CSV) or Scale-Out File Server isn't supported. CSV writers are likely to fail during backup. On restore, disks that contain CSV volumes might not come up.
193193
[Shared disks](../virtual-machines/disks-shared-enable.md) | Not supported.
194-
<a name="ultra-disk-backup">Ultra disks</a> | Supported with [Enhanced policy](backup-azure-vms-enhanced-policy.md). The support is currently in preview. <br><br> [Supported regions](../virtual-machines/disks-types.md#ultra-disk-limitations). <br><br> To enroll your subscription for this feature, [fill this form](https://forms.office.com/r/1GLRnNCntU). <br><br> - Configuration of Ultra disk protection is supported via Recovery Services vault only. This configuration is currently not supported via virtual machine blade. <br><br> - Cross-region restore is currently not supported for machines using Ultra disks. <br><br> - GRS type vaults cannot be used for enabling backup.
195-
<a name="premium-ssd-v2-backup">Premium SSD v2</a> | Supported with [Enhanced policy](backup-azure-vms-enhanced-policy.md). The support is currently in preview. <br><br> [Supported regions](../virtual-machines/disks-types.md#regional-availability). <br><br> To enroll your subscription for this feature, [fill this form](https://forms.office.com/r/h56TpTc773). <br><br> - Configuration of Premium v2 disk protection is supported via Recovery Services vault only. This configuration is currently not supported via virtual machine blade. <br><br> - Cross-region restore is currently not supported for machines using Premium v2 disks. <br><br> - GRS type vaults cannot be used for enabling backup.
194+
<a name="ultra-disk-backup">Ultra disks</a> | Supported with [Enhanced policy](backup-azure-vms-enhanced-policy.md). The support is currently in preview. <br><br> [Supported regions](../virtual-machines/disks-types.md#ultra-disk-limitations). <br><br> - The preview can be tested on any subscription and no enrollment is required. <br><br> - Configuration of Ultra disk protection is supported via Recovery Services vault and via virtual machine blade. <br><br> - Cross-region restore is currently not supported for machines using Ultra disks. <br><br> - GRS type vaults cannot be used for enabling backup. <br><br> - File-level restore is currently not supported for machines using Ultra disks.
195+
<a name="premium-ssd-v2-backup">Premium SSD v2</a> | Supported with [Enhanced policy](backup-azure-vms-enhanced-policy.md). The support is currently in preview. <br><br> [Supported regions](../virtual-machines/disks-types.md#regional-availability). <br><br> - The preview can be tested on any subscription and no enrollment is required. <br><br> - Configuration of Premium SSD v2 disk protection is supported via Recovery Services vault and via virtual machine blade. <br><br> - Cross-region restore is currently not supported for machines using Premium v2 disks. <br><br> - GRS type vaults cannot be used for enabling backup. <br><br> - File-level restore is currently not supported for machines using Premium SSD v2 disks.
196196
[Temporary disks](../virtual-machines/managed-disks-overview.md#temporary-disk) | Azure Backup doesn't back up temporary disks.
197197
NVMe/[ephemeral disks](../virtual-machines/ephemeral-os-disks.md) | Not supported.
198198
[Resilient File System (ReFS)](/windows-server/storage/refs/refs-overview) restore | Supported. Volume Shadow Copy Service (VSS) supports app-consistent backups on ReFS.

articles/communication-services/concepts/interop/teams-user-calling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The Azure Communication Services Calling SDK enables Teams user devices to drive
1919

2020
Key features of the Calling SDK:
2121

22-
- **Addressing** - Azure Communication Services is using [Microsoft Entra user identifier](/powershell/module/azuread/get-azureaduser) to address communication endpoints. Clients use Microsoft Entra identities to authenticate to the service and communicate with each other. These identities are used in Calling APIs that provide clients visibility into who is connected to a call (the roster). And are also used in [Microsoft Graph API](/graph/api/user-get).
22+
- **Addressing** - Azure Communication Services is using [Microsoft Entra user identifier](/powershell/module/microsoft.graph.users/get-mguser) to address communication endpoints. Clients use Microsoft Entra identities to authenticate to the service and communicate with each other. These identities are used in Calling APIs that provide clients visibility into who is connected to a call (the roster). And are also used in [Microsoft Graph API](/graph/api/user-get).
2323
- **Encryption** - The Calling SDK encrypts traffic and prevents tampering on the wire.
2424
- **Device Management and Media** - The Calling SDK provides facilities for binding to audio and video devices, encodes content for efficient transmission over the communications data plane, and renders content to output devices and views that you specify. APIs are also provided for screen and application sharing.
2525
- **Notifications** - The Calling SDK provides APIs that allow clients to be notified of an incoming call. In situations where your app is not running in the foreground, patterns are available to [fire pop-up notifications](../notifications.md) ("toasts") to inform users of an incoming call.

0 commit comments

Comments
 (0)