Skip to content

Commit 7070169

Browse files
authored
Merge pull request #481 from MicrosoftDocs/release-preview-phi3-5-moe-serverless
Release preview phi3 5 moe serverless -> main -- 9/26 at 8:30 AM PT
2 parents 634a121 + e5d6839 commit 7070169

11 files changed

+74
-2332
lines changed

articles/ai-studio/.openpublishing.redirection.ai-studio.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,11 @@
2525
"redirect_url": "/azure/ai-studio/quickstarts/assistants",
2626
"redirect_document_id": true
2727
},
28+
{
29+
"source_path_from_root": "/articles/ai-studio/how-to/deploy-models-phi-3-5-moe.md",
30+
"redirect_url": "/azure/ai-studio/how-to/deploy-models-phi-3",
31+
"redirect_document_id": true
32+
},
2833
{
2934
"source_path_from_root": "/articles/ai-studio/how-to/cli-install.md",
3035
"redirect_url": "/azure/ai-studio/how-to/develop/sdk-overview",

articles/ai-studio/how-to/deploy-models-phi-3-5-moe.md

Lines changed: 0 additions & 1153 deletions
This file was deleted.

articles/ai-studio/how-to/deploy-models-phi-3.md

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use Phi-3 family chat models with Azure AI Studio.
55
ms.service: azure-ai-studio
66
manager: scottpolly
77
ms.topic: how-to
8-
ms.date: 09/13/2024
8+
ms.date: 09/18/2024
99
ms.reviewer: kritifaujdar
1010
reviewer: fkriti
1111
ms.author: mopeakande
@@ -31,7 +31,11 @@ The Phi-3 family chat models include the following models:
3131

3232
# [Phi-3.5](#tab/phi-3-5)
3333

34-
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
34+
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties.
35+
36+
Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
37+
38+
Phi-3.5 MoE (mixture-of-expert) uses 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only transformer model, using a tokenizer with vocabulary size of 32,064.
3539

3640
The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
3741

@@ -41,6 +45,7 @@ The Phi-3.5 models come in the following variants, with the variants having a co
4145
The following models are available:
4246

4347
* [Phi-3.5-Mini-Instruct](https://aka.ms/azureai/landing/Phi-3.5-Mini-Instruct)
48+
* [Phi-3.5-MoE-Instruct](https://aka.ms/azureai/landing/Phi-3.5-MoE-Instruct)
4449

4550

4651
# [Phi-3](#tab/phi-3)
@@ -184,7 +189,7 @@ response = client.complete(
184189
```
185190

186191
> [!NOTE]
187-
> Phi-3.5-Mini-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
192+
> Phi-3.5-Mini-Instruct, Phi-3.5-MoE-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
188193
189194
The response is as follows, where you can see the model's usage statistics:
190195

@@ -256,7 +261,7 @@ print_stream(result)
256261
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
257262

258263
```python
259-
from azure.ai.inference.models import ChatCompletionsResponseFormatText
264+
from azure.ai.inference.models import ChatCompletionsResponseFormat
260265

261266
response = client.complete(
262267
messages=[
@@ -354,7 +359,11 @@ The Phi-3 family chat models include the following models:
354359

355360
# [Phi-3.5](#tab/phi-3-5)
356361

357-
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
362+
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties.
363+
364+
Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
365+
366+
Phi-3.5 MoE (mixture-of-expert) uses 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only transformer model, using a tokenizer with vocabulary size of 32,064.
358367

359368
The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
360369

@@ -364,6 +373,7 @@ The Phi-3.5 models come in the following variants, with the variants having a co
364373
The following models are available:
365374

366375
* [Phi-3.5-Mini-Instruct](https://aka.ms/azureai/landing/Phi-3.5-Mini-Instruct)
376+
* [Phi-3.5-MoE-Instruct](https://aka.ms/azureai/landing/Phi-3.5-MoE-Instruct)
367377

368378

369379
# [Phi-3](#tab/phi-3)
@@ -507,7 +517,7 @@ var response = await client.path("/chat/completions").post({
507517
```
508518

509519
> [!NOTE]
510-
> Phi-3.5-Mini-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
520+
> Phi-3.5-Mini-Instruct, Phi-3.5-MoE-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
511521
512522
The response is as follows, where you can see the model's usage statistics:
513523

@@ -700,7 +710,11 @@ The Phi-3 family chat models include the following models:
700710
701711
# [Phi-3.5](#tab/phi-3-5)
702712
703-
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
713+
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties.
714+
715+
Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
716+
717+
Phi-3.5 MoE (mixture-of-expert) uses 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only transformer model, using a tokenizer with vocabulary size of 32,064.
704718
705719
The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
706720
@@ -710,6 +724,7 @@ The Phi-3.5 models come in the following variants, with the variants having a co
710724
The following models are available:
711725
712726
* [Phi-3.5-Mini-Instruct](https://aka.ms/azureai/landing/Phi-3.5-Mini-Instruct)
727+
* [Phi-3.5-MoE-Instruct](https://aka.ms/azureai/landing/Phi-3.5-MoE-Instruct)
713728
714729
715730
# [Phi-3](#tab/phi-3)
@@ -867,7 +882,7 @@ Response<ChatCompletions> response = client.Complete(requestOptions);
867882
```
868883

869884
> [!NOTE]
870-
> Phi-3.5-Mini-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
885+
> Phi-3.5-Mini-Instruct, Phi-3.5-MoE-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
871886

872887
The response is as follows, where you can see the model's usage statistics:
873888
@@ -1058,7 +1073,11 @@ The Phi-3 family chat models include the following models:
10581073
10591074
# [Phi-3.5](#tab/phi-3-5)
10601075
1061-
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
1076+
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties.
1077+
1078+
Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
1079+
1080+
Phi-3.5 MoE (mixture-of-expert) uses 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only transformer model, using a tokenizer with vocabulary size of 32,064.
10621081
10631082
The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
10641083
@@ -1068,6 +1087,7 @@ The Phi-3.5 models come in the following variants, with the variants having a co
10681087
The following models are available:
10691088
10701089
* [Phi-3.5-Mini-Instruct](https://aka.ms/azureai/landing/Phi-3.5-Mini-Instruct)
1090+
* [Phi-3.5-MoE-Instruct](https://aka.ms/azureai/landing/Phi-3.5-MoE-Instruct)
10711091
10721092
10731093
# [Phi-3](#tab/phi-3)
@@ -1180,7 +1200,7 @@ The following example shows how you can create a basic chat completions request
11801200
```
11811201

11821202
> [!NOTE]
1183-
> Phi-3.5-Mini-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
1203+
> Phi-3.5-Mini-Instruct, Phi-3.5-MoE-Instruct, Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-8k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
11841204

11851205
The response is as follows, where you can see the model's usage statistics:
11861206

articles/ai-studio/how-to/model-catalog-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Llama family models | Llama-2-7b <br> Llama-2-7b-chat <br> Llama-2-13b <br> Llam
6868
Mistral family models | mistralai-Mixtral-8x22B-v0-1 <br> mistralai-Mixtral-8x22B-Instruct-v0-1 <br> mistral-community-Mixtral-8x22B-v0-1 <br> mistralai-Mixtral-8x7B-v01 <br> mistralai-Mistral-7B-Instruct-v0-2 <br> mistralai-Mistral-7B-v01 <br> mistralai-Mixtral-8x7B-Instruct-v01 <br> mistralai-Mistral-7B-Instruct-v01 | Mistral-large (2402) <br> Mistral-large (2407) <br> Mistral-small <br> Mistral-NeMo
6969
Cohere family models | Not available | Cohere-command-r-plus-08-2024 <br> Cohere-command-r-08-2024 <br> Cohere-command-r-plus <br> Cohere-command-r <br> Cohere-embed-v3-english <br> Cohere-embed-v3-multilingual <br> Cohere-rerank-v3-english <br> Cohere-rerank-v3-multilingual
7070
JAIS | Not available | jais-30b-chat
71-
Phi-3 family models | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> Phi-3-vision-128k-Instruct <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct <br> Phi-3.5-MoE-Instruct | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct
71+
Phi-3 family models | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> Phi-3-vision-128k-Instruct <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct <br> Phi-3.5-MoE-Instruct | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct <br> Phi-3.5-MoE-Instruct
7272
Nixtla | Not available | TimeGEN-1
7373
Other models | Available | Not available
7474

articles/ai-studio/includes/region-availability-maas.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: scottpolly
77
ms.service: azure-ai-studio
88
ms.topic: include
99
ms.date: 08/05/2024
10-
ms.custom: include
10+
ms.custom: include, references_regions
1111
---
1212

1313
### Cohere models
@@ -47,6 +47,7 @@ Llama 3.1 405B Instruct | [Microsoft Managed Countries](/partner-center/marketp
4747
|Model |Offer Availability Region | Hub/Project Region for Deployment | Hub/Project Region for Fine tuning |
4848
|---------|---------|---------|---------|
4949
Phi-3.5-vision-Instruct | Not applicable | East US 2 <br> Sweden Central | Not available |
50+
Phi-3.5-MoE-Instruct | Not applicable | East US 2 <br> Sweden Central | Not available |
5051
Phi-3.5-Mini-Instruct | Not applicable | East US 2 <br> Sweden Central | Not available |
5152
Phi-3-Mini-4k-Instruct <br> Phi-3-Mini-128K-Instruct | Not applicable | East US 2 <br> Sweden Central | East US 2 |
5253
Phi-3-Small-8K-Instruct <br> Phi-3-Small-128K-Instruct | Not applicable | East US 2 <br> Sweden Central | Not available |

articles/ai-studio/toc.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,6 @@ items:
131131
items:
132132
- name: Phi-3 family chat models
133133
href: how-to/deploy-models-phi-3.md
134-
- name: Phi-3.5 MoE chat model
135-
href: how-to/deploy-models-phi-3-5-moe.md
136134
- name: Phi-3 chat model with vision
137135
href: how-to/deploy-models-phi-3-vision.md
138136
- name: Phi-3.5 chat model with vision

articles/machine-learning/.openpublishing.redirection.machine-learning.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4295,6 +4295,11 @@
42954295
"redirect_url": "/azure/machine-learning/prompt-flow/troubleshoot-guidance",
42964296
"redirect_document_id": true
42974297
},
4298+
{
4299+
"source_path_from_root": "/articles/machine-learning/how-to-deploy-models-phi-3-5-moe.md",
4300+
"redirect_url": "/azure/machine-learning/how-to-deploy-models-phi-3",
4301+
"redirect_document_id": true
4302+
},
42984303
{
42994304
"source_path_from_root": "/articles/machine-learning/prompt-flow/tools-reference/vector-db-lookup-tool.md",
43004305
"redirect_url": "/azure/machine-learning/prompt-flow/tools-reference/index-lookup-tool",

articles/machine-learning/concept-model-catalog.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Llama family models | Llama-2-7b <br> Llama-2-7b-chat <br> Llama-2-13b <br> Lla
5959
Mistral family models | mistralai-Mixtral-8x22B-v0-1 <br> mistralai-Mixtral-8x22B-Instruct-v0-1 <br> mistral-community-Mixtral-8x22B-v0-1 <br> mistralai-Mixtral-8x7B-v01 <br> mistralai-Mistral-7B-Instruct-v0-2 <br> mistralai-Mistral-7B-v01 <br> mistralai-Mixtral-8x7B-Instruct-v01 <br> mistralai-Mistral-7B-Instruct-v01 | Mistral-large (2402) <br> Mistral-large (2407) <br> Mistral-small <br> Mistral-Nemo
6060
Cohere family models | Not available | Cohere-command-r-plus <br> Cohere-command-r <br> Cohere-embed-v3-english <br> Cohere-embed-v3-multilingual <br> Cohere-rerank-3-english <br> Cohere-rerank-3-multilingual
6161
JAIS | Not available | jais-30b-chat
62-
Phi-3 family models | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> Phi-3-vision-128k-Instruct <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct <br> Phi-3.5-MoE-Instruct | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct
62+
Phi-3 family models | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> Phi-3-vision-128k-Instruct <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct <br> Phi-3.5-MoE-Instruct | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br> Phi-3-small-8k-Instruct <br> Phi-3-small-128k-Instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct <br> <br> Phi-3.5-mini-Instruct <br> Phi-3.5-vision-Instruct <br> Phi-3.5-MoE-Instruct
6363
Nixtla | Not available | TimeGEN-1
6464
Other models | Available | Not available
6565

0 commit comments

Comments
 (0)