You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -5,7 +5,7 @@ description: Learn to deploy and use Cohere Rerank models with Azure AI Foundry.
5
5
manager: scottpolly
6
6
ms.service: azure-ai-foundry
7
7
ms.topic: how-to
8
-
ms.date: 12/06/2024
8
+
ms.date: 02/18/2025
9
9
ms.reviewer: shubhiraj
10
10
ms.author: mopeakande
11
11
author: msakande
@@ -22,13 +22,23 @@ In this article, you learn about the Cohere Rerank models, how to use Azure AI F
22
22
23
23
## Cohere Rerank models
24
24
25
-
Cohere offers two Rerank models in [Azure AI Foundry](https://ai.azure.com). These models are available in the model catalog for deployment as serverless APIs:
25
+
Cohere offers rerank models in [Azure AI Foundry](https://ai.azure.com). These models are available in the model catalog for deployment as serverless APIs:
26
26
27
-
* Cohere Rerank v3 - English
28
-
* Cohere Rerank v3 - Multilingual
27
+
- Cohere Rerank v3.5
28
+
- Cohere Rerank v3 - English
29
+
- Cohere Rerank v3 - Multilingual
29
30
30
31
You can browse the Cohere family of models in the [Model Catalog](model-catalog.md) by filtering on the Cohere collection.
31
32
33
+
### Cohere Rerank v3.5
34
+
35
+
Cohere Rerank 3.5 provides a significant boost to the relevancy of search results. This AI model, also known as a cross-encoder, precisely sorts lists of documents according to their semantic similarity to a provided query. This allows information retrieval systems to go beyond keyword search, and also outperform traditional embedding models, surfacing the most contextually relevant data within end-user applications.
36
+
37
+
Businesses use Cohere Rerank 3.5 to improve their enterprise search and retrieval-augmented generation (RAG) applications across more than 100 languages. With just a few lines of code, you can add the model to existing systems to boost the accuracy of search results. The model is also uniquely performant at searching across complex enterprise data such as JSON, code, and tables. Further, it is capable of reasoning through hard questions which other search systems fail to understand.
38
+
39
+
- Context window of the model is 4,096 tokens
40
+
- Max query length is 4,096 tokens
41
+
32
42
### Cohere Rerank v3 - English
33
43
34
44
Cohere Rerank English is a reranking model used for semantic search and retrieval-augmented generation (RAG). Rerank enables you to significantly improve search quality by augmenting traditional keyword-based search systems with a semantic-based reranking system that can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers higher quality results than embedding-based search, lexical search, and even hybrid search, and it requires only adding a single line of code into your application.
@@ -37,8 +47,8 @@ Use Rerank as a ranker after initial retrieval. In other words, after an initial
37
47
38
48
Rerank supports JSON objects as documents where users can specify, at query time, the fields (keys) to use for semantic search. Some other attributes of Rerank include:
39
49
40
-
* Context window of the model is 4,096 tokens
41
-
* The max query length is 2,048 tokens
50
+
- Context window of the model is 4,096 tokens
51
+
- The max query length is 2,048 tokens
42
52
43
53
Rerank English works well for code retrieval, semi-structured data retrieval, and long context.
44
54
@@ -50,8 +60,8 @@ Use Rerank as a ranker after initial retrieval. In other words, after an initial
50
60
51
61
Rerank supports JSON objects as documents where users can specify, at query time, the fields (keys) to use for semantic search. Some other attributes of Rerank Multilingual include:
52
62
53
-
* Context window of the model is 4,096 tokens
54
-
* The max query length is 2,048 tokens
63
+
- Context window of the model is 4,096 tokens
64
+
- The max query length is 2,048 tokens
55
65
56
66
Rerank multilingual performs well on multilingual benchmarks such as Miracl.
57
67
@@ -71,25 +81,20 @@ You can deploy the previously mentioned Cohere models as a service with pay-as-y
71
81
72
82
- Azure role-based access controls are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](../concepts/rbac-ai-studio.md).
73
83
74
-
75
-
### Create a new deployment
76
-
77
-
The following steps demonstrate the deployment of Cohere Rerank v3 - English, but you can use the same steps to deploy Cohere Rerank v3 - Multilingual by replacing the model name.
84
+
### Create a new deployment
85
+
The following steps demonstrate the deployment of Cohere Rerank v3.5, but you can use the same steps to deploy the other Cohere rerank models by replacing the model name.
4. Select the model card of the model you want to deploy. In this article, you select **Cohere-rerank-v3-english** to open the Model Details page.
84
-
91
+
4. Select the model card of the model you want to deploy. In this article, you select **Cohere-rerank-v3-5** to open the Model Details page.
85
92
1. Select **Deploy** to open a serverless API deployment window for the model.
86
-
1. Alternatively, you can initiate a deployment from your project in the Azure AI Foundry portal as follows:
87
-
93
+
1. Alternatively, you can initiate a deployment from your project in the Azure AI Foundry portal as follows:
88
94
1. From the left sidebar of your project, select **Models + Endpoints**.
89
95
1. Select **+ Deploy model** > **Deploy base model**.
90
-
1. Search for and select **Cohere-rerank-v3-english** to open the Model Details page.
96
+
1. Search for and select **Cohere-rerank-v3-5** to open the Model Details page.
91
97
1. Select **Confirm** to open a serverless API deployment window for the model.
92
-
93
98
1. In the deployment wizard, select the link to **Azure Marketplace Terms** to learn more about the terms of use.
94
99
1. Select the **Pricing and terms** tab to learn about pricing for the selected model.
95
100
1. Select the **Subscribe and Deploy** button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering.
@@ -107,7 +112,7 @@ To create a deployment:
107
112
108
113
To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Cohere models deployed as a service](#cost-and-quota-considerations-for-models-deployed-as-a-service).
109
114
110
-
### Consume the Cohere Rerank models as a service
115
+
### Consume the Cohere Rerank model as a service
111
116
112
117
Cohere Rerank models deployed as serverless APIs can be consumed using the Rerank API.
113
118
@@ -117,56 +122,189 @@ Cohere Rerank models deployed as serverless APIs can be consumed using the Reran
117
122
118
123
1. Copy the **Target** URL and the **Key** value.
119
124
120
-
1.Cohere currently exposes `v1/rerank` for inference with the Rerank v3 - English and Rerank v3 - Multilingual models schema. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
125
+
Cohere currently exposes v2/rerank for inference with Rerank v3.5, Rerank v3 - English, and Rerank v3 - Multilingual models schema. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
121
126
122
127
## Rerank API reference for Cohere Rerank models deployed as a service
123
128
124
-
Cohere Rerank v3 - English and Rerank v3 - Multilingual accept the native Cohere Rerank API on `v1/rerank`. This section contains details about the Cohere Rerank API.
129
+
The native **Cohere Rerank API v2** endpoint on `https://api.cohere.com/v2/rerank` supports inference with Cohere Rerank v3.5, Cohere Rerank v3 - English, and Cohere Rerank v3 - Multilingual.
130
+
131
+
The native **Cohere Rerank API v1** endpoint on `https://api.cohere.com/v1/rerank` supports inference with Cohere Rerank v3 - English and Cohere Rerank v3 - Multilingual.
|`documents`| List of strings | Required | A list of texts that will be compared to the query. For optimal performance we recommend against sending more than 1,000 documents in a single request.<br><br>Note: long documents will automatically be truncated to the value of **max_tokens_per_doc**.<br><br>Note: structured data should be formatted as YAML strings for best performance. |
140
+
| top_n | integer | Optional | Limits the number of returned rerank results to the specified value. If not passed, all the rerank results will be returned. |
141
+
|`return_documents`| boolean |`FALSE`| If FALSE, returns results without the doc text - the API returns a list of {index, relevance_score} where index is inferred from the list passed into the request.<br><br>If TRUE, returns results with the doc text passed in - the API returns an ordered list of {index, text, relevance_score} where index + text refers to the list passed into the request. |
142
+
|`max_chunks_per_doc`| integer | Optional | Defaults to 4096. Long documents will be automatically truncated to the specified number of tokens. |
143
+
|`Model`| string | Required | The identifier of the model to use, eg rerank-v3.5. |
144
+
145
+
146
+
### v2/rerank response schema
147
+
148
+
Response fields are fully documented on [Cohere's Rerank API reference](https://docs.cohere.com/reference/rerank). The response payload is a dictionary with the following fields:
149
+
150
+
| Key | Type | Description |
151
+
| ------------ | -------- | ----------- |
152
+
|`id`| string | Optional |
153
+
|`results`| List of objects | An ordered list of ranked documents |
154
+
|`meta`| object | document is described by an object that includes api_version and version and, optionally, is_deprecated and is_experimental. |
155
+
|`Billed units`| object | Described by an object that are all optionally images, input_tokens, output_tokens, search_units, and classifications |
156
+
|`tokens`| object | Described optionally as input_tokens which are the number of tokens used as input to the model and output_tokens which are the number of tokens produced by the model. |
157
+
|`warnings`| List of strings | Optional |
158
+
159
+
The `results` object is a dictionary with the following fields:
160
+
161
+
| Key | Type | Description |
162
+
| ------------ | -------- | ----------- |
163
+
|`index`| integer | Corresponds to the index in the original list of documents to which the ranked document belongs. (i.e. if the first value in the results object has an index value of 3, it means in the list of documents passed in, the document at index=3 had the highest relevance) |
164
+
|`relevance_score`| double | Relevance scores are normalized to be in the range \[0, 1\]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance. It is not accurate to assume a score of 0.9 means the document is 2x more relevant than a document with a score of 0.45 |
165
+
|`document`| object | If return_documents is set as false this will return none, if true it will return the documents passed in |
166
+
167
+
### Examples using Cohere Rerank API v2
168
+
169
+
#### Request example
170
+
171
+
```python
172
+
173
+
import cohere
174
+
175
+
co = cohere.ClientV2()
176
+
177
+
docs = [
178
+
179
+
"Carson City is the capital city of the American state of Nevada.",
180
+
181
+
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
182
+
183
+
"Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
184
+
185
+
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
186
+
187
+
"Capital punishment has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",
188
+
189
+
]
190
+
191
+
response = co.rerank(
192
+
193
+
model="rerank-v3.5",
194
+
195
+
query="What is the capital of the United States?",
196
+
197
+
documents=docs,
125
198
126
-
#### v1/rerank request
199
+
top_n=3,
200
+
201
+
)
202
+
203
+
print(response)
204
+
```
205
+
206
+
#### Response example
127
207
128
208
```json
209
+
210
+
{
211
+
212
+
"results": [
213
+
214
+
{
215
+
216
+
"index": 3,
217
+
218
+
"relevance_score": 0.999071
219
+
220
+
},
221
+
222
+
{
223
+
224
+
"index": 4,
225
+
226
+
"relevance_score": 0.7867867
227
+
228
+
},
229
+
230
+
{
231
+
232
+
"index": 0,
233
+
234
+
"relevance_score": 0.32713068
235
+
236
+
}
237
+
238
+
],
239
+
240
+
"id": "07734bd2-2473-4f07-94e1-0d9f0e6843cf",
241
+
242
+
"meta": {
243
+
244
+
"api_version": {
245
+
246
+
"version": "2",
247
+
248
+
"is_experimental": false
249
+
250
+
},
251
+
252
+
"billed_units": {
253
+
254
+
"search_units": 1
255
+
256
+
}
257
+
258
+
}
259
+
260
+
}
261
+
262
+
```
263
+
264
+
265
+
### v1/rerank request
266
+
267
+
```json
268
+
129
269
POST /v1/rerank HTTP/1.1
130
270
Host: <DEPLOYMENT_URI>
131
271
Authorization: Bearer <TOKEN>
132
272
Content-type: application/json
133
273
```
134
274
135
-
####v1/rerank request schema
275
+
### v1/rerank request schema
136
276
137
-
Cohere Rerank v3 - English and Rerank v3 - Multilingual accept the following parameters for a `v1/rerank` API call:
277
+
Cohere Rerank v3 - English and Rerank v3 - Multilingual accept the following parameters for a v1/rerank API call:
138
278
139
279
| Property | Type | Default | Description |
140
280
| --- | --- | --- | --- |
141
-
|`query`|`string`|Required |The search query. |
142
-
|`documents`|`array`|None |A list of document objects or strings to rerank. |
143
-
|`top_n`|`integer`|Length of `documents`|The number of most relevant documents or indices to return. |
144
-
|`return_documents`|`boolean`|`FALSE`|If `FALSE`, returns results without the doc text - the API returns a list of {`index`, `relevance_score`} where index is inferred from the list passed into the request. </br>If `TRUE`, returns results with the doc text passed in - the API returns an ordered list of {`index`, `text`, `relevance_score`} where index + text refers to the list passed into the request. |
145
-
|`max_chunks_per_doc`|`integer`|None |The maximum number of chunks to produce internally from a document.|
146
-
|`rank_fields`|`array of strings`|None |If a JSON object is provided, you can specify which keys you would like to consider for reranking. The model reranks based on the order of the fields passed in (for example, `rank_fields=['title','author','text']` reranks, using the values in `title`, `author`, and `text` in that sequence. If the length of title, author, and text exceeds the context length of the model, the chunking won't reconsider earlier fields).<br>If not provided, the model uses the default text field for ranking. |
281
+
|`query`|string|Required |The search query. |
282
+
|`documents`|array|None |A list of document objects or strings to rerank. |
283
+
|`top_n`|integer|Length of documents|The number of most relevant documents or indices to return. |
284
+
|`return_documents`|boolean|`FALSE`|If FALSE, returns results without the doc text - the API returns a list of {index, relevance_score} where index is inferred from the list passed into the request.<br><br>If TRUE, returns results with the doc text passed in - the API returns an ordered list of {index, text, relevance_score} where index + text refers to the list passed into the request. |
285
+
|`max_chunks_per_doc`|integer|None |The maximum number of chunks to produce internally from a document.|
286
+
|`rank_fields`|array of strings|None |If a JSON object is provided, you can specify which keys you would like to consider for reranking. The model reranks based on the order of the fields passed in (for example, rank_fields=\['title','author','text'\] reranks, using the values in title, author, and text in that sequence. If the length of title, author, and text exceeds the context length of the model, the chunking won't reconsider earlier fields).<br><br>If not provided, the model uses the default text field for ranking. |
147
287
148
-
####v1/rerank response schema
288
+
### v1/rerank response schema
149
289
150
290
Response fields are fully documented on [Cohere's Rerank API reference](https://docs.cohere.com/reference/rerank). The response payload is a dictionary with the following fields:
151
291
152
292
| Key | Type | Description |
153
293
| --- | --- | --- |
154
-
|`id`|`string`|An identifier for the response. |
155
-
|`results`|`array of objects`|An ordered list of ranked documents, where each document is described by an object that includes `index` and `relevance_score` and, optionally, `text`. |
156
-
|`meta`|`array of objects`| An optional meta object containing a list of warning strings. |
157
-
158
-
<br>
294
+
|`id`| string | An identifier for the response. |
295
+
|`results`| array of objects | An ordered list of ranked documents, where each document is described by an object that includes index and relevance_score and, optionally, text. |
296
+
|`meta`| array of objects | An optional meta object containing a list of warning strings. |
159
297
160
298
The `results` object is a dictionary with the following fields:
161
299
162
300
| Key | Type | Description |
163
301
| --- | --- | --- |
164
-
|`document`|`object`|The document objects or strings that were reranked. |
165
-
|`index`|`integer`|The `index` in the original list of documents to which the ranked document belongs. For example, if the first value in the `results` object has an index value of 3, it means in the list of documents passed in, the document at `index=3` had the highest relevance.|
166
-
|`relevance_score`|`float`|Relevance scores are normalized to be in the range `[0, 1]`. Scores close to one indicate a high relevance to the query, and scores close to zero indicate low relevance. A score of `0.9`_doesn't_ necessarily mean that a document is twice as relevant as another with a score of `0.45`. |
302
+
|`document`| object|The document objects or strings that were reranked. |
303
+
|`index`| integer|The index in the original list of documents to which the ranked document belongs. For example, if the first value in the results object has an index value of 3, it means in the list of documents passed in, the document at index=3 had the highest relevance.|
304
+
|`relevance_score`| float|Relevance scores are normalized to be in the range \[0, 1\]. Scores close to one indicate a high relevance to the query, and scores close to zero indicate low relevance. A score of 0.9 _doesn't_ necessarily mean that a document is twice as relevant as another with a score of 0.45. |
167
305
306
+
### Examples using Cohere Rerank API v1
168
307
169
-
## Examples
170
308
171
309
#### Request example
172
310
@@ -228,23 +366,21 @@ The `results` object is a dictionary with the following fields:
228
366
#### More inference examples
229
367
230
368
| Package | Sample Notebook |
231
-
|---|---|
232
-
|CLI using CURL and Python web requests|[cohere-rerank.ipynb](https://aka.ms/samples/cohere-rerank/webrequests)|
## Cost and quota considerations for models deployed as a service
237
375
238
-
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
376
+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
239
377
240
378
Cohere models deployed as serverless APIs with pay-as-you-go billing are offered by Cohere through Azure Marketplace and integrated with Azure AI Foundry for use. You can find Azure Marketplace pricing when deploying the model.
241
379
242
380
Each time a project subscribes to a given offer from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently.
243
381
244
382
For more information on how to track costs, see [monitor costs for models offered throughout Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace).
245
383
246
-
247
-
248
384
## Related content
249
385
250
386
-[What is Azure AI Foundry?](../what-is-ai-studio.md)
0 commit comments