Skip to content

Commit 85a168b

Browse files
Merge pull request #971 from msakande/cohere-embed-updates-2
update cohere embed model doc
2 parents ebc430d + 95a87c8 commit 85a168b

File tree

2 files changed

+52
-27
lines changed

2 files changed

+52
-27
lines changed

articles/ai-studio/how-to/deploy-models-cohere-embed.md

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use Cohere Embed V3 models with Azure AI Studio.
55
ms.service: azure-ai-studio
66
manager: scottpolly
77
ms.topic: how-to
8-
ms.date: 08/08/2024
8+
ms.date: 10/23/2024
99
ms.reviewer: shubhiraj
1010
reviewer: shubhirajMsft
1111
ms.author: mopeakande
@@ -31,19 +31,24 @@ The Cohere family of models for embeddings includes the following models:
3131

3232
# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english)
3333

34-
Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
34+
Cohere Embed English is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
3535

36-
* Embed English has 1,024 dimensions.
36+
* Embed English has 1,024 dimensions
3737
* Context window of the model is 512 tokens
38+
* Embed English accepts images as a base64 encoded data url
3839

40+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
41+
3942

4043
# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual)
4144

42-
Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
45+
Cohere Embed Multilingual is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
4346

44-
* Embed Multilingual has 1,024 dimensions.
47+
* Embed Multilingual has 1,024 dimensions
4548
* Context window of the model is 512 tokens
49+
* Embed Multilingual accepts images as a base64 encoded data url
4650

51+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
4752

4853
---
4954

@@ -220,19 +225,23 @@ The Cohere family of models for embeddings includes the following models:
220225

221226
# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english)
222227

223-
Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
228+
Cohere Embed English is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
224229

225-
* Embed English has 1,024 dimensions.
230+
* Embed English has 1,024 dimensions
226231
* Context window of the model is 512 tokens
232+
* Embed English accepts images as a base64 encoded data url
227233

234+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
228235

229236
# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual)
230237

231-
Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
238+
Cohere Embed Multilingual is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
232239

233-
* Embed Multilingual has 1,024 dimensions.
240+
* Embed Multilingual has 1,024 dimensions
234241
* Context window of the model is 512 tokens
242+
* Embed Multilingual accepts images as a base64 encoded data url
235243

244+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
236245

237246
---
238247

@@ -411,19 +420,23 @@ The Cohere family of models for embeddings includes the following models:
411420

412421
# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english)
413422

414-
Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
423+
Cohere Embed English is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
415424

416-
* Embed English has 1,024 dimensions.
425+
* Embed English has 1,024 dimensions
417426
* Context window of the model is 512 tokens
427+
* Embed English accepts images as a base64 encoded data url
418428

429+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
419430

420431
# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual)
421432

422-
Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
433+
Cohere Embed Multilingual is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
423434

424-
* Embed Multilingual has 1,024 dimensions.
435+
* Embed Multilingual has 1,024 dimensions
425436
* Context window of the model is 512 tokens
437+
* Embed Multilingual accepts images as a base64 encoded data url
426438

439+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
427440

428441
---
429442

@@ -653,4 +666,4 @@ Quota is managed per deployment. Each deployment has a rate limit of 200,000 tok
653666
* [Deploy models as serverless APIs](deploy-models-serverless.md)
654667
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
655668
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
656-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
669+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)

articles/machine-learning/how-to-deploy-models-cohere-embed.md

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ manager: scottpolly
66
ms.service: azure-machine-learning
77
ms.subservice: inferencing
88
ms.topic: how-to
9-
ms.date: 09/24/2024
9+
ms.date: 10/23/2024
1010
ms.reviewer: shubhiraj
1111
reviewer: shubhirajMsft
1212
ms.author: mopeakande
@@ -31,19 +31,23 @@ The Cohere family of models for embeddings includes the following models:
3131

3232
# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english)
3333

34-
Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
34+
Cohere Embed English is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
3535

36-
* Embed English has 1,024 dimensions.
36+
* Embed English has 1,024 dimensions
3737
* Context window of the model is 512 tokens
38+
* Embed English accepts images as a base64 encoded data url
3839

40+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
3941

4042
# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual)
4143

42-
Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
44+
Cohere Embed Multilingual is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
4345

44-
* Embed Multilingual has 1,024 dimensions.
46+
* Embed Multilingual has 1,024 dimensions
4547
* Context window of the model is 512 tokens
48+
* Embed Multilingual accepts images as a base64 encoded data url
4649

50+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
4751

4852
---
4953

@@ -220,19 +224,23 @@ The Cohere family of models for embeddings includes the following models:
220224

221225
# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english)
222226

223-
Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
227+
Cohere Embed English is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
224228

225-
* Embed English has 1,024 dimensions.
229+
* Embed English has 1,024 dimensions
226230
* Context window of the model is 512 tokens
231+
* Embed English accepts images as a base64 encoded data url
227232

233+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
228234

229235
# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual)
230236

231-
Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
237+
Cohere Embed Multilingual is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
232238

233-
* Embed Multilingual has 1,024 dimensions.
239+
* Embed Multilingual has 1,024 dimensions
234240
* Context window of the model is 512 tokens
241+
* Embed Multilingual accepts images as a base64 encoded data url
235242

243+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
236244

237245
---
238246

@@ -411,19 +419,23 @@ The Cohere family of models for embeddings includes the following models:
411419

412420
# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english)
413421

414-
Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
422+
Cohere Embed English is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes:
415423

416-
* Embed English has 1,024 dimensions.
424+
* Embed English has 1,024 dimensions
417425
* Context window of the model is 512 tokens
426+
* Embed English accepts images as a base64 encoded data url
418427

428+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
419429

420430
# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual)
421431

422-
Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
432+
Cohere Embed Multilingual is a multimodal (text and image) representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes:
423433

424-
* Embed Multilingual has 1,024 dimensions.
434+
* Embed Multilingual has 1,024 dimensions
425435
* Context window of the model is 512 tokens
436+
* Embed Multilingual accepts images as a base64 encoded data url
426437

438+
Image embeddings consume a fixed number of tokens per image—1,000 tokens per image—which translates to a price of $0.0001 per image embedded. The size or resolution of the image doesn't affect the number of tokens consumed, provided the image is within the accepted dimensions, file size, and formats.
427439

428440
---
429441

0 commit comments

Comments
 (0)