You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/deploy-models-phi-3-5-vision.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ The Phi-3.5 small language models (SLMs) are a collection of instruction-tuned g
27
27
28
28
## Phi-3.5 chat model with vision
29
29
30
-
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
30
+
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
31
31
32
32
33
33
You can learn more about the models in their respective model card:
@@ -298,7 +298,7 @@ import IPython.display as Disp
298
298
Disp.Image(requests.get(image_url).content)
299
299
```
300
300
301
-
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
301
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
302
302
303
303
Now, create a chat completion request with the image:
304
304
@@ -347,7 +347,7 @@ Usage:
347
347
348
348
## Phi-3.5 chat model with vision
349
349
350
-
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
350
+
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
351
351
352
352
353
353
You can learn more about the models in their respective model card:
@@ -632,7 +632,7 @@ img.src = data_url;
632
632
document.body.appendChild(img);
633
633
```
634
634
635
-
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
635
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
636
636
637
637
Now, create a chat completion request with the image:
638
638
@@ -690,7 +690,7 @@ Usage:
690
690
691
691
## Phi-3.5 chat model with vision
692
692
693
-
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
693
+
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
694
694
695
695
696
696
You can learn more about the models in their respective model card:
@@ -741,7 +741,7 @@ using Azure.Identity;
741
741
using Azure.AI.Inference;
742
742
```
743
743
744
-
This example also use the following namespaces but you may not always need them:
744
+
This example also uses the following namespaces but you may not always need them:
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
983
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
984
984
985
985
Now, create a chat completion request with the image:
986
986
@@ -1030,7 +1030,7 @@ Usage:
1030
1030
1031
1031
## Phi-3.5 chat model with vision
1032
1032
1033
-
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites -with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
1033
+
Phi-3.5 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites -with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with128K context length (in tokens)that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
1034
1034
1035
1035
1036
1036
You can learn more about the models in their respective model card:
@@ -1333,11 +1333,11 @@ Phi-3.5-vision-Instruct can reason across text and images and generate text comp
1333
1333
To see this capability, download an image and encode the information as `base64`string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
1334
1334
1335
1335
> [!TIP]
1336
-
> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../media/how-to/sdks/slms-chart-example.jpg) inJPEGformat. A data URL has a format as follows:`data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`.
1336
+
> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../media/how-to/sdks/small-language-models-chart-example.jpg) inJPEGformat. A data URL has a format as follows:`data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`.
1337
1337
1338
1338
Visualize the image:
1339
1339
1340
-
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
1340
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
1341
1341
1342
1342
Now, create a chat completion request with the image:
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/deploy-models-phi-3-vision.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ The Phi-3 family of small language models (SLMs) is a collection of instruction-
27
27
28
28
## Phi-3 chat model with vision
29
29
30
-
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
30
+
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
31
31
32
32
33
33
You can learn more about the models in their respective model card:
@@ -298,7 +298,7 @@ import IPython.display as Disp
298
298
Disp.Image(requests.get(image_url).content)
299
299
```
300
300
301
-
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
301
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
302
302
303
303
Now, create a chat completion request with the image:
304
304
@@ -347,7 +347,7 @@ Usage:
347
347
348
348
## Phi-3 chat model with vision
349
349
350
-
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
350
+
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
351
351
352
352
353
353
You can learn more about the models in their respective model card:
@@ -632,7 +632,7 @@ img.src = data_url;
632
632
document.body.appendChild(img);
633
633
```
634
634
635
-
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
635
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
636
636
637
637
Now, create a chat completion request with the image:
638
638
@@ -690,7 +690,7 @@ Usage:
690
690
691
691
## Phi-3 chat model with vision
692
692
693
-
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
693
+
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites - with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
694
694
695
695
696
696
You can learn more about the models in their respective model card:
@@ -741,7 +741,7 @@ using Azure.Identity;
741
741
using Azure.AI.Inference;
742
742
```
743
743
744
-
This example also use the following namespaces but you may not always need them:
744
+
This example also uses the following namespaces but you may not always need them:
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
983
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
984
984
985
985
Now, create a chat completion request with the image:
986
986
@@ -1030,7 +1030,7 @@ Usage:
1030
1030
1031
1031
## Phi-3 chat model with vision
1032
1032
1033
-
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publicly-available websites -with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
1033
+
Phi-3 Vision is a lightweight, state-of-the-art, open multimodal model. The model was built upon datasets that include synthetic data and filtered, publiclyavailable websites -with a focus on high-quality, reasoning-dense data, both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with128K context length (in tokens) that it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization, to ensure precise instruction adherence and robust safety measures.
1034
1034
1035
1035
1036
1036
You can learn more about the models in their respective model card:
@@ -1333,11 +1333,11 @@ Phi-3-vision-128k-Instruct can reason across text and images and generate text c
1333
1333
To see this capability, download an image and encode the information as `base64`string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
1334
1334
1335
1335
> [!TIP]
1336
-
> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../media/how-to/sdks/slms-chart-example.jpg) inJPEGformat. A data URL has a format as follows:`data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`.
1336
+
> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../media/how-to/sdks/small-language-models-chart-example.jpg) inJPEGformat. A data URL has a format as follows:`data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`.
1337
1337
1338
1338
Visualize the image:
1339
1339
1340
-
:::image type="content" source="../media/how-to/sdks/slms-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/slms-chart-example.jpg":::
1340
+
:::image type="content" source="../media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../media/how-to/sdks/small-language-models-chart-example.jpg":::
1341
1341
1342
1342
Now, create a chat completion request with the image:
0 commit comments