Skip to content

Commit b4eb7e6

Browse files
committed
updates from Cohere - revised to fix formatting issues
1 parent 6840b44 commit b4eb7e6

File tree

1 file changed

+183
-47
lines changed

1 file changed

+183
-47
lines changed

articles/ai-studio/how-to/deploy-models-cohere-rerank.md

Lines changed: 183 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn to deploy and use Cohere Rerank models with Azure AI Foundry.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.topic: how-to
8-
ms.date: 12/06/2024
8+
ms.date: 02/18/2025
99
ms.reviewer: shubhiraj
1010
ms.author: mopeakande
1111
author: msakande
@@ -22,13 +22,23 @@ In this article, you learn about the Cohere Rerank models, how to use Azure AI F
2222

2323
## Cohere Rerank models
2424

25-
Cohere offers two Rerank models in [Azure AI Foundry](https://ai.azure.com). These models are available in the model catalog for deployment as serverless APIs:
25+
Cohere offers rerank models in [Azure AI Foundry](https://ai.azure.com). These models are available in the model catalog for deployment as serverless APIs:
2626

27-
* Cohere Rerank v3 - English
28-
* Cohere Rerank v3 - Multilingual
27+
- Cohere Rerank v3.5
28+
- Cohere Rerank v3 - English
29+
- Cohere Rerank v3 - Multilingual
2930

3031
You can browse the Cohere family of models in the [Model Catalog](model-catalog.md) by filtering on the Cohere collection.
3132

33+
### Cohere Rerank v3.5
34+
35+
Cohere Rerank 3.5 provides a significant boost to the relevancy of search results. This AI model, also known as a cross-encoder, precisely sorts lists of documents according to their semantic similarity to a provided query. This allows information retrieval systems to go beyond keyword search, and also outperform traditional embedding models, surfacing the most contextually relevant data within end-user applications.
36+
37+
Businesses use Cohere Rerank 3.5 to improve their enterprise search and retrieval-augmented generation (RAG) applications across more than 100 languages. With just a few lines of code, you can add the model to existing systems to boost the accuracy of search results. The model is also uniquely performant at searching across complex enterprise data such as JSON, code, and tables. Further, it is capable of reasoning through hard questions which other search systems fail to understand.
38+
39+
- Context window of the model is 4,096 tokens
40+
- Max query length is 4,096 tokens
41+
3242
### Cohere Rerank v3 - English
3343

3444
Cohere Rerank English is a reranking model used for semantic search and retrieval-augmented generation (RAG). Rerank enables you to significantly improve search quality by augmenting traditional keyword-based search systems with a semantic-based reranking system that can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers higher quality results than embedding-based search, lexical search, and even hybrid search, and it requires only adding a single line of code into your application.
@@ -37,8 +47,8 @@ Use Rerank as a ranker after initial retrieval. In other words, after an initial
3747

3848
Rerank supports JSON objects as documents where users can specify, at query time, the fields (keys) to use for semantic search. Some other attributes of Rerank include:
3949

40-
* Context window of the model is 4,096 tokens
41-
* The max query length is 2,048 tokens
50+
- Context window of the model is 4,096 tokens
51+
- The max query length is 2,048 tokens
4252

4353
Rerank English works well for code retrieval, semi-structured data retrieval, and long context.
4454

@@ -50,8 +60,8 @@ Use Rerank as a ranker after initial retrieval. In other words, after an initial
5060

5161
Rerank supports JSON objects as documents where users can specify, at query time, the fields (keys) to use for semantic search. Some other attributes of Rerank Multilingual include:
5262

53-
* Context window of the model is 4,096 tokens
54-
* The max query length is 2,048 tokens
63+
- Context window of the model is 4,096 tokens
64+
- The max query length is 2,048 tokens
5565

5666
Rerank multilingual performs well on multilingual benchmarks such as Miracl.
5767

@@ -71,25 +81,20 @@ You can deploy the previously mentioned Cohere models as a service with pay-as-y
7181

7282
- Azure role-based access controls are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](../concepts/rbac-ai-studio.md).
7383

74-
75-
### Create a new deployment
76-
77-
The following steps demonstrate the deployment of Cohere Rerank v3 - English, but you can use the same steps to deploy Cohere Rerank v3 - Multilingual by replacing the model name.
84+
### Create a new deployment
85+
The following steps demonstrate the deployment of Cohere Rerank v3.5, but you can use the same steps to deploy the other Cohere rerank models by replacing the model name.
7886

7987
To create a deployment:
8088

8189
[!INCLUDE [open-catalog](../includes/open-catalog.md)]
8290

83-
4. Select the model card of the model you want to deploy. In this article, you select **Cohere-rerank-v3-english** to open the Model Details page.
84-
91+
4. Select the model card of the model you want to deploy. In this article, you select **Cohere-rerank-v3-5** to open the Model Details page.
8592
1. Select **Deploy** to open a serverless API deployment window for the model.
86-
1. Alternatively, you can initiate a deployment from your project in the Azure AI Foundry portal as follows:
87-
93+
1. Alternatively, you can initiate a deployment from your project in the Azure AI Foundry portal as follows:
8894
1. From the left sidebar of your project, select **Models + Endpoints**.
8995
1. Select **+ Deploy model** > **Deploy base model**.
90-
1. Search for and select **Cohere-rerank-v3-english** to open the Model Details page.
96+
1. Search for and select **Cohere-rerank-v3-5** to open the Model Details page.
9197
1. Select **Confirm** to open a serverless API deployment window for the model.
92-
9398
1. In the deployment wizard, select the link to **Azure Marketplace Terms** to learn more about the terms of use.
9499
1. Select the **Pricing and terms** tab to learn about pricing for the selected model.
95100
1. Select the **Subscribe and Deploy** button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering.
@@ -107,7 +112,7 @@ To create a deployment:
107112

108113
To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Cohere models deployed as a service](#cost-and-quota-considerations-for-models-deployed-as-a-service).
109114

110-
### Consume the Cohere Rerank models as a service
115+
### Consume the Cohere Rerank model as a service
111116

112117
Cohere Rerank models deployed as serverless APIs can be consumed using the Rerank API.
113118

@@ -117,56 +122,189 @@ Cohere Rerank models deployed as serverless APIs can be consumed using the Reran
117122

118123
1. Copy the **Target** URL and the **Key** value.
119124

120-
1. Cohere currently exposes `v1/rerank` for inference with the Rerank v3 - English and Rerank v3 - Multilingual models schema. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
125+
Cohere currently exposes v2/rerank for inference with Rerank v3.5, Rerank v3 - English, and Rerank v3 - Multilingual models schema. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
121126

122127
## Rerank API reference for Cohere Rerank models deployed as a service
123128

124-
Cohere Rerank v3 - English and Rerank v3 - Multilingual accept the native Cohere Rerank API on `v1/rerank`. This section contains details about the Cohere Rerank API.
129+
The native **Cohere Rerank API v2** endpoint on `https://api.cohere.com/v2/rerank` supports inference with Cohere Rerank v3.5, Cohere Rerank v3 - English, and Cohere Rerank v3 - Multilingual.
130+
131+
The native **Cohere Rerank API v1** endpoint on `https://api.cohere.com/v1/rerank` supports inference with Cohere Rerank v3 - English and Cohere Rerank v3 - Multilingual.
132+
133+
134+
### v2/rerank request schema
135+
136+
| Property | Type | Default | Description |
137+
| ------------ | -------- | ----------- | ------------- |
138+
| `query` | string | Required | The search query. |
139+
| `documents` | List of strings | Required | A list of texts that will be compared to the query. For optimal performance we recommend against sending more than 1,000 documents in a single request.<br><br>Note: long documents will automatically be truncated to the value of **max_tokens_per_doc**.<br><br>Note: structured data should be formatted as YAML strings for best performance. |
140+
| top_n | integer | Optional | Limits the number of returned rerank results to the specified value. If not passed, all the rerank results will be returned. |
141+
| `return_documents` | boolean | `FALSE` | If FALSE, returns results without the doc text - the API returns a list of {index, relevance_score} where index is inferred from the list passed into the request.<br><br>If TRUE, returns results with the doc text passed in - the API returns an ordered list of {index, text, relevance_score} where index + text refers to the list passed into the request. |
142+
| `max_chunks_per_doc` | integer | Optional | Defaults to 4096. Long documents will be automatically truncated to the specified number of tokens. |
143+
| `Model` | string | Required | The identifier of the model to use, eg rerank-v3.5. |
144+
145+
146+
### v2/rerank response schema
147+
148+
Response fields are fully documented on [Cohere's Rerank API reference](https://docs.cohere.com/reference/rerank). The response payload is a dictionary with the following fields:
149+
150+
| Key | Type | Description |
151+
| ------------ | -------- | ----------- |
152+
| `id` | string | Optional |
153+
| `results` | List of objects | An ordered list of ranked documents |
154+
| `meta` | object | document is described by an object that includes api_version and version and, optionally, is_deprecated and is_experimental. |
155+
| `Billed units` | object | Described by an object that are all optionally images, input_tokens, output_tokens, search_units, and classifications |
156+
| `tokens` | object | Described optionally as input_tokens which are the number of tokens used as input to the model and output_tokens which are the number of tokens produced by the model. |
157+
| `warnings` | List of strings | Optional |
158+
159+
The `results` object is a dictionary with the following fields:
160+
161+
| Key | Type | Description |
162+
| ------------ | -------- | ----------- |
163+
| `index` | integer | Corresponds to the index in the original list of documents to which the ranked document belongs. (i.e. if the first value in the results object has an index value of 3, it means in the list of documents passed in, the document at index=3 had the highest relevance) |
164+
| `relevance_score` | double | Relevance scores are normalized to be in the range \[0, 1\]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance. It is not accurate to assume a score of 0.9 means the document is 2x more relevant than a document with a score of 0.45 |
165+
| `document` | object | If return_documents is set as false this will return none, if true it will return the documents passed in |
166+
167+
### Examples using Cohere Rerank API v2
168+
169+
#### Request example
170+
171+
```python
172+
173+
import cohere
174+
175+
co = cohere.ClientV2()
176+
177+
docs = [
178+
179+
"Carson City is the capital city of the American state of Nevada.",
180+
181+
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
182+
183+
"Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
184+
185+
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
186+
187+
"Capital punishment has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",
188+
189+
]
190+
191+
response = co.rerank(
192+
193+
model="rerank-v3.5",
194+
195+
query="What is the capital of the United States?",
196+
197+
documents=docs,
125198

126-
#### v1/rerank request
199+
top_n=3,
200+
201+
)
202+
203+
print(response)
204+
```
205+
206+
#### Response example
127207

128208
```json
209+
210+
{
211+
212+
"results": [
213+
214+
{
215+
216+
"index": 3,
217+
218+
"relevance_score": 0.999071
219+
220+
},
221+
222+
{
223+
224+
"index": 4,
225+
226+
"relevance_score": 0.7867867
227+
228+
},
229+
230+
{
231+
232+
"index": 0,
233+
234+
"relevance_score": 0.32713068
235+
236+
}
237+
238+
],
239+
240+
"id": "07734bd2-2473-4f07-94e1-0d9f0e6843cf",
241+
242+
"meta": {
243+
244+
"api_version": {
245+
246+
"version": "2",
247+
248+
"is_experimental": false
249+
250+
},
251+
252+
"billed_units": {
253+
254+
"search_units": 1
255+
256+
}
257+
258+
}
259+
260+
}
261+
262+
```
263+
264+
265+
### v1/rerank request
266+
267+
```json
268+
129269
POST /v1/rerank HTTP/1.1
130270
Host: <DEPLOYMENT_URI>
131271
Authorization: Bearer <TOKEN>
132272
Content-type: application/json
133273
```
134274

135-
#### v1/rerank request schema
275+
### v1/rerank request schema
136276

137-
Cohere Rerank v3 - English and Rerank v3 - Multilingual accept the following parameters for a `v1/rerank` API call:
277+
Cohere Rerank v3 - English and Rerank v3 - Multilingual accept the following parameters for a v1/rerank API call:
138278

139279
| Property | Type | Default | Description |
140280
| --- | --- | --- | --- |
141-
|`query` |`string` |Required |The search query. |
142-
|`documents` |`array` |None |A list of document objects or strings to rerank. |
143-
|`top_n` |`integer` |Length of `documents` |The number of most relevant documents or indices to return. |
144-
|`return_documents` |`boolean` |`FALSE` |If `FALSE`, returns results without the doc text - the API returns a list of {`index`, `relevance_score`} where index is inferred from the list passed into the request. </br>If `TRUE`, returns results with the doc text passed in - the API returns an ordered list of {`index`, `text`, `relevance_score`} where index + text refers to the list passed into the request. |
145-
|`max_chunks_per_doc` |`integer` |None |The maximum number of chunks to produce internally from a document.|
146-
|`rank_fields` |`array of strings` |None |If a JSON object is provided, you can specify which keys you would like to consider for reranking. The model reranks based on the order of the fields passed in (for example, `rank_fields=['title','author','text']` reranks, using the values in `title`, `author`, and `text` in that sequence. If the length of title, author, and text exceeds the context length of the model, the chunking won't reconsider earlier fields).<br> If not provided, the model uses the default text field for ranking. |
281+
| `query` | string | Required | The search query. |
282+
| `documents` | array | None | A list of document objects or strings to rerank. |
283+
| `top_n` | integer | Length of documents | The number of most relevant documents or indices to return. |
284+
| `return_documents` | boolean | `FALSE` | If FALSE, returns results without the doc text - the API returns a list of {index, relevance_score} where index is inferred from the list passed into the request.<br><br>If TRUE, returns results with the doc text passed in - the API returns an ordered list of {index, text, relevance_score} where index + text refers to the list passed into the request. |
285+
| `max_chunks_per_doc` | integer | None | The maximum number of chunks to produce internally from a document. |
286+
| `rank_fields` | array of strings | None | If a JSON object is provided, you can specify which keys you would like to consider for reranking. The model reranks based on the order of the fields passed in (for example, rank_fields=\['title','author','text'\] reranks, using the values in title, author, and text in that sequence. If the length of title, author, and text exceeds the context length of the model, the chunking won't reconsider earlier fields).<br><br>If not provided, the model uses the default text field for ranking. |
147287

148-
#### v1/rerank response schema
288+
### v1/rerank response schema
149289

150290
Response fields are fully documented on [Cohere's Rerank API reference](https://docs.cohere.com/reference/rerank). The response payload is a dictionary with the following fields:
151291

152292
| Key | Type | Description |
153293
| --- | --- | --- |
154-
| `id` | `string` |An identifier for the response. |
155-
| `results` | `array of objects`|An ordered list of ranked documents, where each document is described by an object that includes `index` and `relevance_score` and, optionally, `text`. |
156-
| `meta` | `array of objects` | An optional meta object containing a list of warning strings. |
157-
158-
<br>
294+
| `id` | string | An identifier for the response. |
295+
| `results` | array of objects | An ordered list of ranked documents, where each document is described by an object that includes index and relevance_score and, optionally, text. |
296+
| `meta` | array of objects | An optional meta object containing a list of warning strings. |
159297

160298
The `results` object is a dictionary with the following fields:
161299

162300
| Key | Type | Description |
163301
| --- | --- | --- |
164-
| `document` | `object` |The document objects or strings that were reranked. |
165-
| `index` | `integer` |The `index` in the original list of documents to which the ranked document belongs. For example, if the first value in the `results` object has an index value of 3, it means in the list of documents passed in, the document at `index=3` had the highest relevance.|
166-
| `relevance_score` | `float` |Relevance scores are normalized to be in the range `[0, 1]`. Scores close to one indicate a high relevance to the query, and scores close to zero indicate low relevance. A score of `0.9` _doesn't_ necessarily mean that a document is twice as relevant as another with a score of `0.45`. |
302+
| `document` | object | The document objects or strings that were reranked. |
303+
| `index` | integer | The index in the original list of documents to which the ranked document belongs. For example, if the first value in the results object has an index value of 3, it means in the list of documents passed in, the document at index=3 had the highest relevance. |
304+
| `relevance_score` | float | Relevance scores are normalized to be in the range \[0, 1\]. Scores close to one indicate a high relevance to the query, and scores close to zero indicate low relevance. A score of 0.9 _doesn't_ necessarily mean that a document is twice as relevant as another with a score of 0.45. |
167305

306+
### Examples using Cohere Rerank API v1
168307

169-
## Examples
170308

171309
#### Request example
172310

@@ -228,23 +366,21 @@ The `results` object is a dictionary with the following fields:
228366
#### More inference examples
229367

230368
| Package | Sample Notebook |
231-
|---|---|
232-
|CLI using CURL and Python web requests| [cohere-rerank.ipynb](https://aka.ms/samples/cohere-rerank/webrequests)|
233-
|LangChain|[langchain.ipynb](https://aka.ms/samples/cohere-rerank/langchain)|
234-
|Cohere SDK|[cohere-sdk.ipynb](https://aka.ms/samples/cohere-rerank/cohere-python-sdk)|
369+
| --- | --- |
370+
| CLI using CURL and Python web requests | [cohere-rerank.ipynb](https://aka.ms/samples/cohere-rerank/webrequests) |
371+
| LangChain | [langchain.ipynb](https://aka.ms/samples/cohere-rerank/langchain) |
372+
| Cohere SDK | [cohere-sdk.ipynb](https://aka.ms/samples/cohere-rerank/cohere-python-sdk) |
235373

236374
## Cost and quota considerations for models deployed as a service
237375

238-
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
376+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
239377

240378
Cohere models deployed as serverless APIs with pay-as-you-go billing are offered by Cohere through Azure Marketplace and integrated with Azure AI Foundry for use. You can find Azure Marketplace pricing when deploying the model.
241379

242380
Each time a project subscribes to a given offer from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently.
243381

244382
For more information on how to track costs, see [monitor costs for models offered throughout Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace).
245383

246-
247-
248384
## Related content
249385

250386
- [What is Azure AI Foundry?](../what-is-ai-studio.md)

0 commit comments

Comments
 (0)