Skip to content

Commit 875fd4f

Browse files
committed
update docs
1 parent 235f967 commit 875fd4f

File tree

3 files changed

+85
-0
lines changed

3 files changed

+85
-0
lines changed

docs/source/bge/bge_code.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
BGE-Code-v1
2+
===========
3+
4+
**`BGE-Code-v1 <https://huggingface.co/BAAI/bge-code-v1>`_** is an LLM-based code embedding model that supports code retrieval, text retrieval, and multilingual retrieval. It primarily demonstrates the following capabilities:
5+
- Superior Code Retrieval Performance: The model demonstrates exceptional code retrieval capabilities, supporting natural language queries in both English and Chinese, as well as 20 programming languages.
6+
- Robust Text Retrieval Capabilities: The model maintains strong text retrieval capabilities comparable to text embedding models of similar scale.
7+
- Extensive Multilingual Support: BGE-Code-v1 offers comprehensive multilingual retrieval capabilities, excelling in languages such as English, Chinese, Japanese, French, and more.
8+
9+
+-------------------------------------------------------------------+-----------------+------------+--------------+----------------------------------------------------------------------------------------------------+
10+
| Model | Language | Parameters | Model Size | Description |
11+
+===================================================================+=================+============+==============+====================================================================================================+
12+
| `BAAI/bge-code-v1 <https://huggingface.co/BAAI/bge-code-v1>`_ | Multilingual | 1.5B | 6.18 GB | SOTA code retrieval model, with exceptional multilingual text retrieval performance as well |
13+
+-------------------------------------------------------------------+-----------------+------------+--------------+----------------------------------------------------------------------------------------------------+
14+
15+
16+
.. code:: python
17+
from FlagEmbedding import FlagLLMModel
18+
19+
queries = [
20+
"Delete the record with ID 4 from the 'Staff' table.",
21+
'Delete all records in the "Livestock" table where age is greater than 5'
22+
]
23+
documents = [
24+
"DELETE FROM Staff WHERE StaffID = 4;",
25+
"DELETE FROM Livestock WHERE age > 5;"
26+
]
27+
28+
model = FlagLLMModel('BAAI/bge-code-v1',
29+
query_instruction_format="<instruct>{}\n<query>{}",
30+
query_instruction_for_retrieval="Given a question in text, retrieve SQL queries that are appropriate responses to the question.",
31+
trust_remote_code=True,
32+
use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
33+
embeddings_1 = model.encode_queries(queries)
34+
embeddings_2 = model.encode_corpus(documents)
35+
similarity = embeddings_1 @ embeddings_2.T
36+
print(similarity)

docs/source/bge/bge_vl.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ BGE-VL contains light weight CLIP based models as well as more powerful LLAVA-Ne
1616
+----------------------------------------------------------------------+-----------+------------+--------------+-----------------------------------------------------------------------+
1717
| `BAAI/bge-vl-MLLM-S2 <https://huggingface.co/BAAI/BGE-VL-MLLM-S2>`_ | English | 7.57B | 15.14 GB | Finetune BGE-VL-MLLM-S1 with one epoch on MMEB training set |
1818
+----------------------------------------------------------------------+-----------+------------+--------------+-----------------------------------------------------------------------+
19+
| `BAAI/BGE-VL-v1.5-zs <https://huggingface.co/BAAI/BGE-VL-v1.5-zs>`_ | English | 7.57B | 15.14 GB | Better multi-modal retrieval model with performs well in all kinds of tasks |
20+
| `BAAI/BGE-VL-v1.5-mmeb <https://huggingface.co/BAAI/BGE-VL-v1.5-mmeb>`_ | English | 7.57B | 15.14 GB | Better multi-modal retrieval model, additionally fine-tuned on MMEB training set |
1921

2022

2123
BGE-VL-CLIP
@@ -107,4 +109,50 @@ The normalized last hidden state of the [EOS] token in the MLLM is used as the e
107109
print(scores)
108110
109111
112+
BGE-VL-v1.5
113+
-----------
114+
115+
BGE-VL-v1.5 series is the updated version of BGE-VL, bringing better performance on both retrieval and multi-modal understanding. The models were trained on 30M MegaPairs data and extra 10M natural and synthetic data.
116+
117+
`bge-vl-v1.5-zs` is a zero-shot model, only trained on the data mentioned above. `bge-vl-v1.5-mmeb` is the fine-tuned version on MMEB training set.
118+
119+
120+
.. code:: python
121+
122+
import torch
123+
from transformers import AutoModel
124+
from PIL import Image
125+
126+
MODEL_NAME= "BAAI/BGE-VL-v1.5-mmeb" # "BAAI/BGE-VL-v1.5-zs"
127+
128+
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
129+
model.eval()
130+
model.cuda()
131+
132+
with torch.no_grad():
133+
model.set_processor(MODEL_NAME)
134+
135+
query_inputs = model.data_process(
136+
text="Make the background dark, as if the camera has taken the photo at night",
137+
images="../../imgs/cir_query.png",
138+
q_or_c="q",
139+
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: "
140+
)
141+
142+
candidate_inputs = model.data_process(
143+
images=["../../imgs/cir_candi_1.png", "../../imgs/cir_candi_2.png"],
144+
q_or_c="c",
145+
)
146+
147+
query_embs = model(**query_inputs, output_hidden_states=True)[:, -1, :]
148+
candi_embs = model(**candidate_inputs, output_hidden_states=True)[:, -1, :]
149+
150+
query_embs = torch.nn.functional.normalize(query_embs, dim=-1)
151+
candi_embs = torch.nn.functional.normalize(candi_embs, dim=-1)
152+
153+
scores = torch.matmul(query_embs, candi_embs.T)
154+
print(scores)
155+
156+
157+
110158
For more details, check out the repo of `MegaPairs <https://github.com/VectorSpaceLab/MegaPairs>`_

docs/source/bge/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ BGE
1515
bge_m3
1616
bge_icl
1717
bge_vl
18+
bge_code
1819

1920
.. toctree::
2021
:maxdepth: 1

0 commit comments

Comments
 (0)