feat(inference): newest embedding model (#3917)

tgenaitay · bene2k1 · nerda-codes · web-flow · commit ad4e3a7caf34 · 2024-10-31T11:09:12.000+01:00
* feat(inference): newest embedding

* feat(inference): edited menu

* Apply suggestions from code review

Co-authored-by: nerda-codes &lt;87707325+nerda-codes@users.noreply.github.com&gt;

* Update ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

* Update ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

* Update ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

---------

Co-authored-by: Benedikt Rollik &lt;brollik@scaleway.com&gt;
Co-authored-by: nerda-codes &lt;87707325+nerda-codes@users.noreply.github.com&gt;
Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
diff --git a/ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx b/ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx
@@ -0,0 +1,69 @@
+---
+meta:
+  title: Understanding the BGE-Multilingual-Gemma2 embedding model
+  description: Deploy your own secure BGE-Multilingual-Gemma2 embedding model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Understanding the BGE-Multilingual-Gemma2 embedding model
+  paragraph: This page provides information on the BGE-Multilingual-Gemma2 embedding model
+tags: embedding
+categories:
+dates:
+  validation: 2024-10-30
+  posted: 2024-10-30
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [baai](https://huggingface.co/BAAI)  |
+| Compatible Instances | L4 (FP32)    |
+| Context size | 4096 tokens    |
+
+## Model name
+
+```bash
+baai/bge-multilingual-gemma2:fp32
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| L4      | 4096 (FP32) | 
+
+## Model introduction
+
+BGE is short for BAAI General Embedding. This particular model is an LLM-based embedding, trained on a diverse range of languages and tasks from the lightweight [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b).
+As such, it is distributed under the [Gemma terms of use](https://ai.google.dev/gemma/terms).
+
+## Why is it useful?
+
+- BGE-Multilingual-Gemma2 tops the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard), scoring the number one spot in French and Polish, and number seven in English, at the time of writing this page (Q4 2024).
+- As its name suggests, the model's training data spans a broad range of languages, including English, Chinese, Polish, French, and more.
+- It encodes text into 3584-dimensional vectors, providing a very detailed representation of sentence semantics.
+- BGE-Multilingual-Gemma2 in its L4/FP32 configuration boats a high context length of 4096 tokens, particularly useful for ingesting data and building RAG applications.
+
+## How to use it
+
+### Sending Managed Inference requests
+
+To perform inference tasks with your embedding model deployed at Scaleway, use the following command:
+
+```bash
+curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": "Embeddings can represent text in a numerical format.",
+    "model": "baai/bge-multilingual-gemma2:fp32"
+  }'
+```
+
+Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+### Receiving Inference responses
+
+Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+Process the output data according to your application's needs. The response will contain the output generated by the embedding model based on the input provided in the request.
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -623,14 +623,14 @@
                     "label": "Mixtral-8x7b-instruct-v0.1 model",
                     "slug": "mixtral-8x7b-instruct-v0.1"
                   },
-                  {
-                    "label": "WizardLM-70b-v1.0 model",
-                    "slug": "wizardlm-70b-v1.0"
-                  },
                   {
                     "label": "Sentence-t5-xxl model",
                     "slug": "sentence-t5-xxl"
                   },
+                  {
+                    "label": "BGE-Multilingual-Gemma2 model",
+                    "slug": "bge-multilingual-gemma2"
+                  },
                   {
                     "label": "Pixtral-12b-2409 model",
                     "slug": "pixtral-12b-2409"