You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this article, you learn how to use Azure AI Studio to deploy the Cohere Rerank models as serverless APIs with pay-as-you-go token-based billing.
19
+
In this article, you learn about the Cohere Rerank models, how to use Azure AI Studio to deploy them as serverless APIs with pay-as-you-go token-based billing, and how to work with the deployed models.
20
20
21
-
Cohere offers two Rerank models in [Azure AI Studio](https://ai.azure.com). These models are available as serverless APIs with pay-as-you-go token-based billing. You can browse the Cohere family of models in the [Model Catalog](model-catalog.md) by filtering on the Cohere collection.
21
+
## Cohere Rerank models
22
22
23
-
## Cohere Rerank models
24
-
25
-
In this section, you learn about the two Cohere Rerank models that are available in the model catalog:
23
+
Cohere offers two Rerank models in [Azure AI Studio](https://ai.azure.com). These models are available in the model catalog for deployment as serverless APIs:
26
24
27
25
* Cohere Rerank 3 - English
28
26
* Cohere Rerank 3 - Multilingual
29
27
30
-
You can browse the Cohere family of models in the [Model Catalog](model-catalog-overview.md) by filtering on the Cohere collection.
28
+
You can browse the Cohere family of models in the [Model Catalog](model-catalog.md) by filtering on the Cohere collection.
31
29
32
30
### Cohere Rerank 3 - English
33
-
Cohere Rerank English is the market's leading reranking model used for semantic search and retrieval-augmented generation (RAG). Rerank enables you to significantly improve search quality by augmenting traditional key-word based search systems with a semantic-based reranking system which can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers much higher quality results than just embedding-based search, lexical search and even hybrid search, and it requires only adding a single line of code into your application.
34
-
Rerank should be used as a ranker after initial retrieval (i.e. an initial search system finds the top-100 most relevant documents for a larger corpus of documents).
35
31
36
-
Rerank supports JSON objects as documents where users can specify at query time the fields (keys) that semantic search should be applied over.
32
+
Cohere Rerank English is a reranking model used for semantic search and retrieval-augmented generation (RAG). Rerank enables you to significantly improve search quality by augmenting traditional keyword-based search systems with a semantic-based reranking system that can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers higher quality results than embedding-based search, lexical search, and even hybrid search, and it requires only adding a single line of code into your application.
33
+
34
+
Use Rerank as a ranker after initial retrieval. In other words, after an initial search system finds the top 100 most relevant documents for a larger corpus of documents.
35
+
36
+
Rerank supports JSON objects as documents where users can specify, at query time, the fields (keys) to use for semantic search. Some other attributes of Rerank include:
37
37
38
-
* Context window of the model is 4096 tokens
39
-
* The max query length is 2048 tokens
38
+
* Context window of the model is 4,096 tokens
39
+
* The max query length is 2,048 tokens
40
40
41
-
Rerank English has SOTA performance on benchmarks in Code Retreival, Semi-structured Data Retreival, and Long Context. Cohere evaluated Rerank English on various configurations with BM25 (lexical search) as the initial retrieval step as well as Embeddings as the initial retrieval step <ahref="https://github.com/cohere-ai/notebooks/blob/main/public_rerank_benchmarks/bm25_with_rerank.md">BM25 with Rerank v3.0 General Retreival Evaluation Results</a> and <ahref="https://github.com/cohere-ai/notebooks/blob/main/public_rerank_benchmarks/embed_with_rerank.md">Embeddings with Rerank v3.0 General Retreival Evaluation Results</a>
41
+
Rerank English works well for code retrieval, semi-structured data retrieval, and long context.
42
42
43
43
### Cohere Rerank 3 - Multilingual
44
-
Cohere Rerank Multilingual is the market's leading reranking model used for semantic search and retrieval-augmented generation (RAG). Rerank Multilingual supports 100+ languages and can be used to search within a language (e.g., search with a French query on French documents) and across languages (e.g., search with an English query on Chinese documents). Rerank enables you to significantly improve search quality by augmenting traditional key-word based search systems with a semantic-based reranking system which can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers much higher quality results than just embedding-based search, lexical search and even hybrid search, and it requires only adding a single line of code into your application.
45
-
Rerank should be used as a ranker after initial retrieval (i.e. an initial search system finds the top-100 most relevant documents for a larger corpus of documents).
46
44
47
-
Rerank supports JSON objects as documents where users can specify at query time the fields (keys) that semanticsearch should be applied over.
45
+
Cohere Rerank Multilingual is a reranking model used for semantic search and retrieval-augmented generation (RAG). Rerank Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Rerank enables you to significantly improve search quality by augmenting traditional keyword-based search systems with a semantic-based reranking system that can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers higher quality results than embedding-based search, lexical search, and even hybrid search, and it requires only adding a single line of code into your application.
48
46
49
-
* Context window of the model is 4096 tokens
50
-
* The max query length is 2048 tokens
47
+
Use Rerank as a ranker after initial retrieval. In other words, after an initial search system finds the top 100 most relevant documents for a larger corpus of documents.
51
48
52
-
Rerank multilingual has SOTA performance on multilingual benchmarks such as Miracl. Cohere evaluated Rerank multilingual on various configurations with BM25 (lexical search) as the initial retrieval step as well as Embeddings as the initial retrieval step <ahref="https://github.com/cohere-ai/notebooks/blob/main/public_rerank_benchmarks/miracl.md">Rerank v3.0 Miracl Evaluation Results</a>.
49
+
Rerank supports JSON objects as documents where users can specify, at query time, the fields (keys) to use for semantic search. Some other attributes of Rerank Multilingual include:
53
50
54
-
## Deploy as a serverless API
51
+
* Context window of the model is 4,096 tokens
52
+
* The max query length is 2,048 tokens
53
+
54
+
Rerank multilingual performs well on multilingual benchmarks such as Miracl.
55
+
56
+
## Deploy Cohere Rerank models as serverless APIs
55
57
56
58
Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.
57
59
58
-
The previously mentioned Cohere models can be deployed as a service with pay-as-you-go billing and are offered by Cohere through the Microsoft Azure Marketplace. Cohere can change or update the terms of use and pricing of these models.
60
+
You can deploy the previously mentioned Cohere models as a service with pay-as-you-go billing. Cohere offers these models through the Microsoft Azure Marketplace and can change or update the terms of use and pricing of these models.
59
61
60
62
### Prerequisites
61
63
@@ -72,7 +74,7 @@ The previously mentioned Cohere models can be deployed as a service with pay-as-
72
74
73
75
For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md).
74
76
75
-
- An [AI Studio project](../how-to/create-projects.md) in Azure AI Studio.
77
+
- An [Azure AI Studio project](../how-to/create-projects.md).
76
78
- Azure role-based access controls are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md).
77
79
78
80
@@ -103,28 +105,28 @@ To create a deployment:
103
105
1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region.
104
106
105
107
1. Select **Deploy**. Wait until the deployment is ready and you're redirected to the Deployments page.
106
-
1.Return to the Deployments page, select the deployment, and note the endpoint's **Target** URL and the Secret **Key**. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
108
+
1.On the Deployments page, select the deployment, and note the endpoint's **Target** URL and the Secret **Key**. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
107
109
1. You can always find the endpoint's details, URL, and access keys by navigating to your **Project overview** page. Then, from the left sidebar of your project, select **Components** > **Deployments**.
108
110
109
111
To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Cohere models deployed as a service](#cost-and-quota-considerations-for-models-deployed-as-a-service).
110
112
111
113
### Consume the Cohere Rerank models as a service
112
114
113
-
These models can be consumed using the Rerank API.
115
+
Cohere Rerank models deployed as serverless APIs can be consumed using the Rerank API.
114
116
115
117
1. From your **Project overview** page, go to the left sidebar and select **Components** > **Deployments**.
116
118
117
119
1. Find and select the deployment you created.
118
120
119
121
1. Copy the **Target** URL and the **Key** value.
120
122
121
-
1. Cohere currently exposes `v1/rerank` for inference with the Rerank 3 - English and Rerank 3 - Multilingual models schema.
122
-
123
-
For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
123
+
1. Cohere currently exposes `v1/rerank` for inference with the Rerank 3 - English and Rerank 3 - Multilingual models schema. For more information on using the APIs, see the [reference](#rerank-api-reference-for-cohere-rerank-models-deployed-as-a-service) section.
124
124
125
125
## Rerank API reference for Cohere Rerank models deployed as a service
126
126
127
-
### v1/rerank Request
127
+
Cohere Rerank 3 - English and Rerank 3 - Multilingual accept the native Cohere Rerank API on `v1/rerank`. This section contains details about the Cohere Rerank API.
128
+
129
+
#### v1/rerank request
128
130
129
131
```json
130
132
POST /v1/rerank HTTP/1.1
@@ -133,28 +135,28 @@ These models can be consumed using the Rerank API.
133
135
Content-type: application/json
134
136
```
135
137
136
-
### v1/rerank Request Schema
138
+
####v1/rerank request schema
137
139
138
140
Cohere Rerank 3 - English and Rerank 3 - Multilingual accept the following parameters for a `v1/rerank` API call:
139
141
140
142
| Property | Type | Default | Description |
141
143
| --- | --- | --- | --- |
142
-
|`query`|`string`|Required |The search query |
144
+
|`query`|`string`|Required |The search query.|
143
145
|`documents`|`array`|None |A list of document objects or strings to rerank. |
144
146
|`top_n`|`integer`|Length of `documents`|The number of most relevant documents or indices to return. |
145
-
|`return_documents`|`boolean`|`FALSE`|If `FALSE`, returns results without the doc text - the api will return a list of {`index`, `relevance_score`} where index is inferred from the list passed into the request. </br>If `TRUE`, returns results with the doc text passed in - the api will return an ordered list of {`index`, `text`, `relevance_score`} where index + text refers to the list passed into the request. |
147
+
|`return_documents`|`boolean`|`FALSE`|If `FALSE`, returns results without the doc text - the API returns a list of {`index`, `relevance_score`} where index is inferred from the list passed into the request. </br>If `TRUE`, returns results with the doc text passed in - the API returns an ordered list of {`index`, `text`, `relevance_score`} where index + text refers to the list passed into the request. |
146
148
|`max_chunks_per_doc`|`integer`|None |The maximum number of chunks to produce internally from a document.|
147
-
|`rank_fields`|`array of strings`|None |If a JSON object is provided, you can specify which keys you would like to have considered for reranking. The model will rerank based on order of the fields passed in (i.e. `rank_fields=['title','author','text']`will rerank using the values in `title`, `author`, `text`sequentially. If the length of title, author, and text exceeds the context length of the model, the chunking will not re-consider earlier fields). If not provided, the model will use the default text field for ranking. |
149
+
|`rank_fields`|`array of strings`|None |If a JSON object is provided, you can specify which keys you would like to consider for reranking. The model reranks based on the order of the fields passed in (for example, `rank_fields=['title','author','text']`reranks, using the values in `title`, `author`, and `text`in that sequence. If the length of title, author, and text exceeds the context length of the model, the chunking won't reconsider earlier fields).<br> If not provided, the model uses the default text field for ranking. |
148
150
149
-
### v1/rerank Response Schema
151
+
####v1/rerank response schema
150
152
151
153
Response fields are fully documented on [Cohere's Rerank API reference](https://docs.cohere.com/reference/rerank). The response payload is a dictionary with the following fields:
152
154
153
155
| Key | Type | Description |
154
156
| --- | --- | --- |
155
157
|`id`|`string`|An identifier for the response. |
156
-
|`results`|`array of objects`|An ordered list of ranked documents, where each document is described by an object that includes `index`, and `relevance_score`, and optionally `text`. |
157
-
|`meta`|`array of objects`| An optional meta object containing a list of warning strings |
158
+
|`results`|`array of objects`|An ordered list of ranked documents, where each document is described by an object that includes `index` and `relevance_score` and, optionally,`text`. |
159
+
|`meta`|`array of objects`| An optional meta object containing a list of warning strings.|
158
160
159
161
<br>
160
162
@@ -163,13 +165,13 @@ The `results` object is a dictionary with the following fields:
163
165
| Key | Type | Description |
164
166
| --- | --- | --- |
165
167
|`document`|`object`|The document objects or strings that were reranked. |
166
-
|`index`|`ingeter`|Corresponds to the `index` in the original list of documents to which the ranked document belongs. (i.e. if the first value in the results object has an index value of 3, it means in the list of documents passed in, the document at `index=3` had the highest relevance)|
167
-
|`relevance_score`|`float`|Relevance scores are normalized to be in the range `[0, 1]`. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance. It is _not accurate_ to assume a score of `0.9`means the document is 2x more relevant than a document with a score of `0.45`|
168
+
|`index`|`ingeter`|The `index` in the original list of documents to which the ranked document belongs. For example, if the first value in the `results` object has an index value of 3, it means in the list of documents passed in, the document at `index=3` had the highest relevance.|
169
+
|`relevance_score`|`float`|Relevance scores are normalized to be in the range `[0, 1]`. Scores close to one indicate a high relevance to the query, and scores close to zero indicate low relevance. A score of `0.9`_doesn't_ necessarily mean that a document is twice as relevant as another with a score of `0.45`|
168
170
169
171
170
172
## Examples
171
173
172
-
**Request**
174
+
#### Request example
173
175
174
176
```json
175
177
{
@@ -184,7 +186,7 @@ The `results` object is a dictionary with the following fields:
184
186
}
185
187
```
186
188
187
-
**Response**
189
+
#### Response example
188
190
189
191
```json
190
192
{
@@ -238,13 +240,15 @@ The `results` object is a dictionary with the following fields:
238
240
239
241
### Cost and quota considerations for models deployed as a service
240
242
241
-
Cohere models deployed as a serverless API with pay-as-you-go billing are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model.
243
+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
244
+
245
+
Cohere models deployed as serverless APIs with pay-as-you-go billing are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model.
242
246
243
247
Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently.
244
248
245
249
For more information on how to track costs, see [monitor costs for models offered throughout the Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace).
246
250
247
-
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
0 commit comments