Skip to content

Commit f5816cc

Browse files
Merge pull request #277195 from ssalgadodev/patch-113
Update how-to-deploy-models-llama.md
2 parents aa6fe7a + fa7ce43 commit f5816cc

File tree

3 files changed

+24
-3
lines changed

3 files changed

+24
-3
lines changed

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,17 @@ Follow these steps to deploy a model such as `Llama-2-7b-chat` to a real-time en
524524

525525
For reference about how to invoke Llama models deployed to managed compute, see the model's card in the Azure AI Studio [model catalog](../how-to/model-catalog-overview.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
526526

527+
##### More inference examples
528+
529+
| **Package** | **Sample Notebook** |
530+
|----------------|----------------------------------------|
531+
| CLI using CURL and Python web requests - Command R | [command-r.ipynb](https://aka.ms/samples/cohere-command-r/webrequests)|
532+
| CLI using CURL and Python web requests - Command R+ | [command-r-plus.ipynb](https://aka.ms/samples/cohere-command-r-plus/webrequests)|
533+
| OpenAI SDK (experimental) | [openaisdk.ipynb](https://aka.ms/samples/cohere-command/openaisdk) |
534+
| LangChain | [langchain.ipynb](https://aka.ms/samples/cohere/langchain) |
535+
| Cohere SDK | [cohere-sdk.ipynb](https://aka.ms/samples/cohere-python-sdk) |
536+
| LiteLLM SDK | [litellm.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) |
537+
527538
## Cost and quotas
528539

529540
### Cost and quota considerations for Llama models deployed as a service

articles/ai-studio/how-to/model-catalog-overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,16 +89,16 @@ Models available for deployment to a Managed compute can be deployed to Azure Ma
8989
* [Deploy Meta Llama models](deploy-models-llama.md)
9090
* [Deploy Open models Created by Azure AI](deploy-models-open.md)
9191

92-
### Build Generative AI Apps with Managed computes
92+
### Build Generative AI Apps with Managed compute
9393

9494
Prompt flow offers a great experience for prototyping. You can use models deployed with Managed computes in Prompt Flow with the [Open Model LLM tool](../../machine-learning/prompt-flow/tools-reference/open-model-llm-tool.md). You can also use the REST API exposed by managed compute in popular LLM tools like LangChain with the [Azure Machine Learning extension](https://python.langchain.com/docs/integrations/chat/azureml_chat_endpoint/).
9595

9696

97-
### Content safety for models deployed as Managed Computes
97+
### Content safety for models deployed as Managed compute
9898

9999
[Azure AI Content Safety (AACS)](../../ai-services/content-safety/overview.md) service is available for use with Managed computes to screen for various categories of harmful content such as sexual content, violence, hate, and self-harm and advanced threats such as Jailbreak risk detection and Protected material text detection. You can refer to this notebook for reference integration with AACS for [Llama 2](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb) or use the Content Safety (Text) tool in Prompt Flow to pass responses from the model to AACS for screening. You are billed separately as per [AACS pricing](https://azure.microsoft.com/pricing/details/cognitive-services/content-safety/) for such use.
100100

101-
### Serverless APIs with Pay-as-you-go billing
101+
## Serverless APIs with Pay-as-you-go billing
102102

103103
Certain models in the Model Catalog can be deployed as serverless APIs with pay-as-you-go billing; this method of deployment is called Models-as-a Service (MaaS), providing a way to consume them as an API without hosting them on your subscription. Models available through MaaS are hosted in infrastructure managed by Microsoft, which enables API-based access to the model provider's model. API based access can dramatically reduce the cost of accessing a model and significantly simplify the provisioning experience. Most MaaS models come with token-based pricing.
104104

articles/machine-learning/how-to-deploy-models-llama.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -544,6 +544,16 @@ For more information on how to deploy models to managed compute using the studio
544544

545545
For reference about how to invoke Meta Llama 3 models deployed to real-time endpoints, see the model's card in Azure Machine Learning studio [model catalog](concept-model-catalog.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
546546

547+
#### Additional inference examples
548+
549+
| **Package** | **Sample Notebook** |
550+
|----------------|----------------------------------------|
551+
| CLI using CURL and Python web requests | [cohere-embed.ipynb](https://aka.ms/samples/embed-v3/webrequests)|
552+
| OpenAI SDK (experimental) | [openaisdk.ipynb](https://aka.ms/samples/cohere-embed/openaisdk) |
553+
| LangChain | [langchain.ipynb](https://aka.ms/samples/cohere-embed/langchain) |
554+
| Cohere SDK | [cohere-sdk.ipynb](https://aka.ms/samples/cohere-embed/cohere-python-sdk) |
555+
| LiteLLM SDK | [litellm.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) |
556+
547557
## Cost and quotas
548558

549559
### Cost and quota considerations for Meta Llama models deployed as a serverless API

0 commit comments

Comments
 (0)