MicrosoftDocs
diff --git a/‎articles/ai-foundry/concepts/models-featured.md
Lines changed: 1 addition & 9 deletions b/‎articles/ai-foundry/concepts/models-featured.md
Lines changed: 1 addition & 9 deletions
diff --git a/‎articles/ai-foundry/includes/region-availability-maas.md
Lines changed: 28 additions & 28 deletions b/‎articles/ai-foundry/includes/region-availability-maas.md
Lines changed: 28 additions & 28 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 4 additions & 4 deletions b/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/csharp.md
Lines changed: 4 additions & 70 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/csharp.md
Lines changed: 4 additions & 70 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/java.md
Lines changed: 4 additions & 4 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/java.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/javascript.md
Lines changed: 4 additions & 83 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/javascript.md
Lines changed: 4 additions & 83 deletions
@@ -19,15 +19,7 @@ The Azure AI model catalog offers a large selection of models from a wide range
 
 [!INCLUDE [models-preview](../includes/models-preview.md)]
 
-To perform inferencing with the models, some models such as [Nixtla's TimeGEN-1](#nixtla) and [Cohere rerank](#cohere-rerank) require you to use custom APIs from the model providers. Others that belong to the following model types support inferencing using the [Azure AI model inference](../model-inference/overview.md):
-
-- [Chat completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context)
-- [Chat completion (with reasoning content)](../model-inference/how-to/use-chat-reasoning.md?context=/azure/ai-foundry/context/context)
-- [Chat completion (with image and audio content)](../model-inference/how-to/use-chat-multi-modal.md?context=/azure/ai-foundry/context/context)
-- [Embeddings](../model-inference/how-to/use-embeddings.md?context=/azure/ai-foundry/context/context)
-- [Image embeddings](../model-inference/how-to/use-image-embeddings.md?context=/azure/ai-foundry/context/context)
-
-You can find more details about individual models by reviewing their model cards in the [model catalog for Azure AI Foundry portal](https://ai.azure.com/explore/models).
+To perform inferencing with the models, some models such as [Nixtla's TimeGEN-1](#nixtla) and [Cohere rerank](#cohere-rerank) require you to use custom APIs from the model providers. Others support inferencing using the [Azure AI model inference](../model-inference/overview.md). You can find more details about individual models by reviewing their model cards in the [model catalog for Azure AI Foundry portal](https://ai.azure.com/explore/models).
 
 :::image type="content" source="../media/models-featured/models-catalog.gif" alt-text="An animation showing Azure AI studio model catalog section and the models available." lightbox="../media/models-featured/models-catalog.gif":::
 
 
@@ -29,15 +29,15 @@ To learn more about deployment options for Azure OpenAI models see [Azure OpenAI
 
 ## Deployment types for Models-as-a-Service models
 
-Models from third-party model providers with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under **standard** deployments with a Global processing option (`Global-Standard`). 
+Models with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under **standard** deployments with a Global processing option (`Global-Standard`). 
+
+> [!TIP]
+> Models-as-a-Service offers regional deployment options under [Serverless API endpoints](../../../ai-studio/how-to/deploy-models-serverless.md) in Azure AI Foundry. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services and they need to be created within a project.
 
 ### Global-Standard
 
 Global deployments leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources. Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure location. Learn more about [data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
 
-> [!NOTE]
-> Models-as-a-Service offers regional deployment options under [Serverless API endpoints](../../../ai-studio/how-to/deploy-models-serverless.md) in Azure AI Foundry. Prompts and outputs are processed within the geography specified during deployment. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services.
-
 ## Control deployment options
 
 Administrators can control which model deployment types are available to their users by using Azure Policies. Learn more about [How to control AI model deployment with custom policies](../../../ai-studio/how-to/custom-policy-model-deployment.md).
 
@@ -7,7 +7,7 @@ author: mopeakande
 reviewer: santiagxf
 ms.service: azure-ai-model-inference
 ms.topic: how-to
-ms.date: 1/21/2025
+ms.date: 03/20/2025
 ms.author: mopeakande
 ms.reviewer: fasantia
 ms.custom: references_regions, tool_generated
@@ -26,7 +26,7 @@ To use chat completion models in your application, you need:
 
 [!INCLUDE [how-to-prerequisites-csharp](../how-to-prerequisites-csharp.md)]
 
-* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
+* A chat completions model deployment. If you don't have one, read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
     * This example uses `mistral-large-2407`.
 
@@ -42,7 +42,7 @@ ChatCompletionsClient client = new ChatCompletionsClient(
 );
 ```
 
-If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
+If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
 
 
 ```csharp
@@ -181,7 +181,7 @@ response = client.Complete(requestOptions);
 Console.WriteLine($"Response: {response.Value.Content}");
 ```
 
-Some models don't support JSON output formatting. You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
+Some models don't support JSON output formatting. You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
 
 If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
 
@@ -400,69 +400,3 @@ catch (RequestFailedException ex)
 
 > [!TIP]
 > To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
-
-## Use chat completions with images
-
-Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Some models for vision in a chat fashion:
-
-> [!IMPORTANT]
-> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
-
-To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
-
-
-```csharp
-string imageUrl = "https://news.microsoft.com/source/wp-content/uploads/2024/04/The-Phi-3-small-language-models-with-big-potential-1-1900x1069.jpg";
-string imageFormat = "jpeg";
-HttpClient httpClient = new HttpClient();
-httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0");
-byte[] imageBytes = httpClient.GetByteArrayAsync(imageUrl).Result;
-string imageBase64 = Convert.ToBase64String(imageBytes);
-string dataUrl = $"data:image/{imageFormat};base64,{imageBase64}";
-```
-
-Visualize the image:
-
-:::image type="content" source="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg":::
-
-Now, create a chat completion request with the image:
-
-
-```csharp
-ChatCompletionsOptions requestOptions = new ChatCompletionsOptions()
-{
-    Messages = {
-        new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
-        new ChatRequestUserMessage([
-            new ChatMessageTextContentItem("Which conclusion can be extracted from the following chart?"),
-            new ChatMessageImageContentItem(new Uri(dataUrl))
-        ]),
-    },
-    MaxTokens=2048,
-    Model = "phi-3.5-vision-instruct",
-};
-
-var response = client.Complete(requestOptions);
-Console.WriteLine(response.Value.Content);
-```
-
-The response is as follows, where you can see the model's usage statistics:
-
-
-```csharp
-Console.WriteLine($"{response.Value.Role}: {response.Value.Content}");
-Console.WriteLine($"Model: {response.Value.Model}");
-Console.WriteLine("Usage:");
-Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}");
-Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}");
-Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}");
-```
-
-```console
-ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
-Model: phi-3.5-vision-instruct
-Usage: 
-  Prompt tokens: 2380
-  Completion tokens: 126
-  Total tokens: 2506
-```
@@ -7,7 +7,7 @@ author: mopeakande
 reviewer: santiagxf
 ms.service: azure-ai-model-inference
 ms.topic: how-to
-ms.date: 1/21/2025
+ms.date: 03/20/2025
 ms.author: mopeakande
 ms.reviewer: fasantia
 ms.custom: references_regions, tool_generated
@@ -26,7 +26,7 @@ To use chat completion models in your application, you need:
 
 [!INCLUDE [how-to-prerequisites-java](../how-to-prerequisites-java.md)]
 
-* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
+* A chat completions model deployment. If you don't have one, read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
     * This example uses `mistral-large-2407`.
 
@@ -41,7 +41,7 @@ ChatCompletionsClient client = new ChatCompletionsClientBuilder()
     .buildClient();
 ```
 
-If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
+If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
 
 ```java
 TokenCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
@@ -120,7 +120,7 @@ client.completeStream(new ChatCompletionsOptions(chatMessages))
 #### Explore more parameters supported by the inference client
 
 Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
-Some models don't support JSON output formatting. You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
+Some models don't support JSON output formatting. You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
 
 If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
 
 
@@ -7,7 +7,7 @@ author: mopeakande
 reviewer: santiagxf
 ms.service: azure-ai-model-inference
 ms.topic: how-to
-ms.date: 1/21/2025
+ms.date: 03/20/2025
 ms.author: mopeakande
 ms.reviewer: fasantia
 ms.custom: references_regions, tool_generated
@@ -26,7 +26,7 @@ To use chat completion models in your application, you need:
 
 [!INCLUDE [how-to-prerequisites-javascript](../how-to-prerequisites-javascript.md)]
 
-* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
+* A chat completions model deployment. If you don't have one, read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
 ## Use chat completions
 
@@ -44,7 +44,7 @@ const client = new ModelClient(
 );
 ```
 
-If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
+If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
 
 
 ```javascript
@@ -177,7 +177,7 @@ var response = await client.path("/chat/completions").post({
 });
 ```
 
-Some models don't support JSON output formatting. You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
+Some models don't support JSON output formatting. You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
 
 If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
 
@@ -388,82 +388,3 @@ catch (error) {
 > [!TIP]
 > To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
 
-## Use chat completions with images
-
-Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Some models for vision in a chat fashion:
-
-> [!IMPORTANT]
-> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
-
-To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
-
-
-```javascript
-const image_url = "https://news.microsoft.com/source/wp-content/uploads/2024/04/The-Phi-3-small-language-models-with-big-potential-1-1900x1069.jpg";
-const image_format = "jpeg";
-
-const response = await fetch(image_url, { headers: { "User-Agent": "Mozilla/5.0" } });
-const image_data = await response.arrayBuffer();
-const image_data_base64 = Buffer.from(image_data).toString("base64");
-const data_url = `data:image/${image_format};base64,${image_data_base64}`;
-```
-
-Visualize the image:
-
-
-```javascript
-const img = document.createElement("img");
-img.src = data_url;
-document.body.appendChild(img);
-```
-
-:::image type="content" source="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg":::
-
-Now, create a chat completion request with the image:
-
-
-```javascript
-var messages = [
-    { role: "system", content: "You are a helpful assistant that can generate responses based on images." },
-    { role: "user", content: 
-        [
-            { type: "text", text: "Which conclusion can be extracted from the following chart?" },
-            { type: "image_url", image:
-                {
-                    url: data_url
-                }
-            } 
-        ] 
-    }
-];
-
-var response = await client.path("/chat/completions").post({
-    body: {
-        messages: messages,
-        temperature: 0,
-        top_p: 1,
-        max_tokens: 2048,
-    }
-});
-```
-
-The response is as follows, where you can see the model's usage statistics:
-
-
-```javascript
-console.log(response.body.choices[0].message.role + ": " + response.body.choices[0].message.content);
-console.log("Model:", response.body.model);
-console.log("Usage:");
-console.log("\tPrompt tokens:", response.body.usage.prompt_tokens);
-console.log("\tCompletion tokens:", response.body.usage.completion_tokens);
-console.log("\tTotal tokens:", response.body.usage.total_tokens);
-```
-
-```console
-ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
-Model: mistral-large-2407
-Usage: 
-  Prompt tokens: 2380
-  Completion tokens: 126
-  Total tokens: 2506
-```