feat(inference): newest embedding

tgenaitay · tgenaitay · commit 17b2a3328a86 · 2024-10-30T13:44:14.000+01:00
diff --git a/ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx b/ai-data/managed-inference/reference-content/bge-multilingual-gemma2.mdx
@@ -0,0 +1,66 @@
+---
+meta:
+  title: Understanding the BGE-Multilingual-Gemma2 embedding model
+  description: Deploy your own secure BGE-Multilingual-Gemma2 embedding model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Understanding the BGE-Multilingual-Gemma2 embedding model
+  paragraph: This page provides information on the BGE-Multilingual-Gemma2 embedding model
+tags: embedding
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [baai](https://huggingface.co/BAAI)  |
+| Compatible Instances | L4 (FP32)    |
+| Context size | 4096 tokens    |
+
+## Model name
+
+```bash
+baai/bge-multilingual-gemma2:fp32
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| L4      | 4096 (FP32) | 
+
+## Model introduction
+
+BGE is short for BAAI General Embedding. This particular model is an LLM-based embedding, trained on a diverse range of languages and tasks from the lightweight [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b).
+As such, it is distributed under the [Gemma terms of use](https://ai.google.dev/gemma/terms).
+
+## Why is it useful?
+
+- BGE-Multilingual-Gemma2 tops the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) scoring #1 in french, #1 in polish, #7 in english, as of writing (Q4 2024).
+- As its name suggests, the model's training data spans a broad range of languages, including English, Chinese, Polish, French, and more!
+- It encodes text into 3584-dimensional vectors, providing a very detailed representation of sentence semantics.
+- BGE-Multilingual-Gemma2 in its L4/FP32 configuration boats a high context length of 4096 tokens, particularly useful for ingesting data and building RAG applications.
+
+## How to use it
+
+### Sending Managed Inference requests
+
+To perform inference tasks with your Embedding model deployed at Scaleway, use the following command:
+
+```bash
+curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \
+  -H "Authorization: Bearer <IAM API key>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": "Embeddings can represent text in a numerical format.",
+    "model": "baai/bge-multilingual-gemma2:fp32"
+  }'
+```
+
+Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+### Receiving Inference responses
+
+Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+Process the output data according to your application's needs. The response will contain the output generated by the embedding model based on the input provided in the request.