Merge pull request #277195 from ssalgadodev/patch-113

prmerger-automator[bot] · web-flow · commit f5816cc6bbdc · 2024-06-05T20:55:05.000Z
Update how-to-deploy-models-llama.md
diff --git a/articles/ai-studio/how-to/deploy-models-llama.md b/articles/ai-studio/how-to/deploy-models-llama.md
@@ -524,6 +524,17 @@ Follow these steps to deploy a model such as `Llama-2-7b-chat` to a real-time en
 
 For reference about how to invoke Llama models deployed to managed compute, see the model's card in the Azure AI Studio [model catalog](../how-to/model-catalog-overview.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
 
+##### More inference examples
+
+| **Package**       | **Sample Notebook**                             |
+|----------------|----------------------------------------|
+| CLI using CURL and Python web requests - Command R   | [command-r.ipynb](https://aka.ms/samples/cohere-command-r/webrequests)|
+| CLI using CURL and Python web requests - Command R+   | [command-r-plus.ipynb](https://aka.ms/samples/cohere-command-r-plus/webrequests)|
+| OpenAI SDK (experimental)    | [openaisdk.ipynb](https://aka.ms/samples/cohere-command/openaisdk)                                    |
+| LangChain      | [langchain.ipynb](https://aka.ms/samples/cohere/langchain)                                |
+| Cohere SDK     | [cohere-sdk.ipynb](https://aka.ms/samples/cohere-python-sdk)                                 |
+| LiteLLM SDK    | [litellm.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) |
+
 ## Cost and quotas
 
 ### Cost and quota considerations for Llama models deployed as a service
diff --git a/articles/ai-studio/how-to/model-catalog-overview.md b/articles/ai-studio/how-to/model-catalog-overview.md
@@ -89,16 +89,16 @@ Models available for deployment to a Managed compute can be deployed to Azure Ma
 * [Deploy Meta Llama models](deploy-models-llama.md) 
 * [Deploy Open models Created by Azure AI](deploy-models-open.md)
 
-### Build Generative AI Apps with Managed computes
+### Build Generative AI Apps with Managed compute
 
 Prompt flow offers a great experience for prototyping. You can use models deployed with Managed computes in Prompt Flow with the [Open Model LLM tool](../../machine-learning/prompt-flow/tools-reference/open-model-llm-tool.md).  You can also use the REST API exposed by managed compute in popular LLM tools like LangChain with the [Azure Machine Learning extension](https://python.langchain.com/docs/integrations/chat/azureml_chat_endpoint/).  
 
 
-### Content safety for models deployed as Managed Computes 
+### Content safety for models deployed as Managed compute
 
 [Azure AI Content Safety (AACS)](../../ai-services/content-safety/overview.md) service is available for use with Managed computes to screen for various categories of harmful content such as sexual content, violence, hate, and self-harm and advanced threats such as Jailbreak risk detection and Protected material text detection. You can refer to this notebook for reference integration with AACS for [Llama 2](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb) or use the Content Safety (Text) tool in Prompt Flow to pass responses from the model to AACS for screening. You are billed separately as per [AACS pricing](https://azure.microsoft.com/pricing/details/cognitive-services/content-safety/) for such use. 
 
-### Serverless APIs with Pay-as-you-go billing
+## Serverless APIs with Pay-as-you-go billing
 
 Certain models in the Model Catalog can be deployed as serverless APIs with pay-as-you-go billing; this method of deployment is called Models-as-a Service (MaaS), providing a way to consume them as an API without hosting them on your subscription. Models available through MaaS are hosted in infrastructure managed by Microsoft, which enables API-based access to the model provider's model. API based access can dramatically reduce the cost of accessing a model and significantly simplify the provisioning experience. Most MaaS models come with token-based pricing.   
 
diff --git a/articles/machine-learning/how-to-deploy-models-llama.md b/articles/machine-learning/how-to-deploy-models-llama.md
@@ -544,6 +544,16 @@ For more information on how to deploy models to managed compute using the studio
 
 For reference about how to invoke Meta Llama 3 models deployed to real-time endpoints, see the model's card in Azure Machine Learning studio [model catalog](concept-model-catalog.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
 
+#### Additional inference examples
+
+| **Package**       | **Sample Notebook**                             |
+|----------------|----------------------------------------|
+| CLI using CURL and Python web requests  | [cohere-embed.ipynb](https://aka.ms/samples/embed-v3/webrequests)|
+| OpenAI SDK (experimental)    | [openaisdk.ipynb](https://aka.ms/samples/cohere-embed/openaisdk)                                    |
+| LangChain      | [langchain.ipynb](https://aka.ms/samples/cohere-embed/langchain)                                |
+| Cohere SDK     | [cohere-sdk.ipynb](https://aka.ms/samples/cohere-embed/cohere-python-sdk)                                 |
+| LiteLLM SDK    | [litellm.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) |
+
 ## Cost and quotas
 
 ### Cost and quota considerations for Meta Llama models deployed as a serverless API