Skip to content

Commit 99ff466

Browse files
feat: Add encode and similarity of Sentence transformers (#1012)
# What does this PR do? Fixes #975 - [x] Add `encode()` and `similarity()` - [x] tests - [x] new doc for sentence transformers ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you make sure to update the documentation with your changes? - [ ] Did you write any new necessary tests? --------- Co-authored-by: Michael Benayoun <[email protected]>
1 parent d8ee14f commit 99ff466

File tree

10 files changed

+327
-122
lines changed

10 files changed

+327
-122
lines changed

docs/source/inference_tutorials/sentence_transformers.mdx

Lines changed: 17 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -24,59 +24,49 @@ This guide explains how to compile, load, and use [Sentence Transformers (SBERT)
2424

2525
### Convert Sentence Transformers model to AWS Inferentia2
2626

27-
First, you need to convert your Sentence Transformers model to a format compatible with AWS Inferentia2. You can compile Sentence Transformers models with Optimum Neuron using the `optimum-cli` or `NeuronModelForSentenceTransformers` class. Below you will find an example for both approaches. We have to make sure `sentence-transformers` is installed. That's only needed for exporting the model.
27+
First, you need to convert your Sentence Transformers model to a format compatible with AWS Inferentia2. You can compile Sentence Transformers models with Optimum Neuron using the `optimum-cli` or `NeuronSentenceTransformers` class. Below you will find an example for both approaches. We have to make sure `sentence-transformers` is installed. That's only needed for exporting the model.
2828

2929
```bash
3030
pip install sentence-transformers
3131
```
3232

33-
Here we will use the `NeuronModelForSentenceTransformers`, which can be used to convert any Sentence Transformers model to a format compatible with AWS Inferentia2 or load already converted models. When exporting models with the `NeuronModelForSentenceTransformers` you need to set `export=True` and define the input shape and batch size. The input shape is defined by the `sequence_length` and the batch size by `batch_size`.
33+
Here we will use the `NeuronSentenceTransformers`, which can be used to convert any Sentence Transformers model to a format compatible with AWS Inferentia2 or load already converted models. When exporting models with the `NeuronSentenceTransformers` you need to set `export=True` and define the input shape and batch size. The input shape is defined by the `sequence_length` and the batch size by `batch_size`.
3434

3535
```python
36-
from optimum.neuron import NeuronModelForSentenceTransformers
36+
from optimum.neuron import NeuronSentenceTransformers
3737

3838
# Sentence Transformers model from HuggingFace
3939
model_id = "BAAI/bge-small-en-v1.5"
4040
input_shapes = {"batch_size": 1, "sequence_length": 384} # mandatory shapes
4141

4242
# Load Transformers model and export it to AWS Inferentia2
43-
model = NeuronModelForSentenceTransformers.from_pretrained(model_id, export=True, **input_shapes)
43+
model = NeuronSentenceTransformers.from_pretrained(model_id, export=True, **input_shapes)
4444

4545
# Save model to disk
4646
model.save_pretrained("bge_emb_inf2/")
4747
```
4848

49-
Here we will use the `optimum-cli` to convert the model. Similar to the `NeuronModelForSentenceTransformers` we need to define our input shape and batch size. The input shape is defined by the `sequence_length` and the batch size by `batch_size`. The `optimum-cli` will automatically convert the model to a format compatible with AWS Inferentia2 and save it to the specified output directory.
49+
Here we will use the `optimum-cli` to convert the model. Similar to the `NeuronSentenceTransformers` we need to define our input shape and batch size. The input shape is defined by the `sequence_length` and the batch size by `batch_size`. The `optimum-cli` will automatically convert the model to a format compatible with AWS Inferentia2 and save it to the specified output directory.
5050

5151
```bash
5252
optimum-cli export neuron -m BAAI/bge-small-en-v1.5 --sequence_length 384 --batch_size 1 --task feature-extraction bge_emb_inf2/
5353
```
5454

5555
### Load compiled Sentence Transformers model and run inference
5656

57-
Once we have a compiled Sentence Transformers model, which we either exported ourselves or is available on the Hugging Face Hub, we can load it and run inference. For loading the model we can use the `NeuronModelForSentenceTransformers` class, which is an abstraction layer for the `SentenceTransformer` class. The `NeuronModelForSentenceTransformers` class will automatically pad the input to the specified `sequence_length` and run inference on AWS Inferentia2.
57+
Once we have a compiled Sentence Transformers model, which we either exported ourselves or is available on the Hugging Face Hub, we can load it and run inference. For loading the model we can use the `NeuronSentenceTransformers` class, which is an abstraction layer for the `SentenceTransformer` class. The `NeuronSentenceTransformers` class will automatically pad the input to the specified `sequence_length` and run inference on AWS Inferentia2.
5858

5959
```python
60-
from optimum.neuron import NeuronModelForSentenceTransformers
61-
from transformers import AutoTokenizer
60+
from optimum.neuron import NeuronSentenceTransformers
6261

6362
model_id_or_path = "bge_emb_inf2/"
64-
tokenizer_id = "BAAI/bge-small-en-v1.5"
6563

6664
# Load model and tokenizer
67-
model = NeuronModelForSentenceTransformers.from_pretrained(model_id_or_path)
68-
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)
65+
model = NeuronSentenceTransformers.from_pretrained(model_id_or_path)
6966

7067
# Run inference
71-
prompt = "I like to eat apples"
72-
encoded_input = tokenizer(prompt, return_tensors='pt')
73-
outputs = model(**encoded_input)
74-
75-
token_embeddings = outputs.token_embeddings
76-
sentence_embedding = outputs.sentence_embedding
77-
78-
print(f"token embeddings: {token_embeddings.shape}") # torch.Size([1, 7, 384])
79-
print(f"sentence_embedding: {sentence_embedding.shape}") # torch.Size([1, 384])
68+
token_embeddings = model.encode(output_value="token_embeddings")
69+
sentence_embedding = model.encode(output_value="sentence_embedding")
8070
```
8171

8272
### Production Usage
@@ -89,18 +79,18 @@ For deploying these models in a production environment, refer to the [Amazon Sag
8979

9080
### Compile CLIP for AWS Inferentia2
9181

92-
You can compile CLIP models with Optimum Neuron either by using the `optimum-cli` or `NeuronModelForSentenceTransformers` class. Adopt one approach that you prefer:
82+
You can compile CLIP models with Optimum Neuron either by using the `optimum-cli` or `NeuronSentenceTransformers` class. Adopt one approach that you prefer:
9383

9484
* With the Optimum CLI
9585

9686
```bash
9787
optimum-cli export neuron -m sentence-transformers/clip-ViT-B-32 --sequence_length 64 --text_batch_size 3 --image_batch_size 1 --num_channels 3 --height 224 --width 224 --task feature-extraction --subfolder 0_CLIPModel clip_emb/
9888
```
9989

100-
* With the `NeuronModelForSentenceTransformers` class
90+
* With the `NeuronSentenceTransformers` class
10191

10292
```python
103-
from optimum.neuron import NeuronModelForSentenceTransformers
93+
from optimum.neuron import NeuronSentenceTransformers
10494

10595
model_id = "sentence-transformers/clip-ViT-B-32"
10696

@@ -114,7 +104,7 @@ input_shapes = {
114104
"sequence_length": 64,
115105
}
116106

117-
emb_model = NeuronModelForSentenceTransformers.from_pretrained(
107+
emb_model = NeuronSentenceTransformers.from_pretrained(
118108
model_id, subfolder="0_CLIPModel", export=True, library_name="sentence_transformers", dynamic_batch_size=False, **input_shapes
119109
)
120110

@@ -130,10 +120,10 @@ from PIL import Image
130120
from sentence_transformers import util
131121
from transformers import CLIPProcessor
132122

133-
from optimum.neuron import NeuronModelForSentenceTransformers
123+
from optimum.neuron import NeuronSentenceTransformers
134124

135125
save_directory = "clip_emb"
136-
emb_model = NeuronModelForSentenceTransformers.from_pretrained(save_directory)
126+
emb_model = NeuronSentenceTransformers.from_pretrained(save_directory)
137127

138128
processor = CLIPProcessor.from_pretrained(save_directory)
139129
inputs = processor(
@@ -154,7 +144,7 @@ print(cos_scores)
154144

155145
**Caveat**
156146

157-
Since compiled models with dynamic batching enabled only accept input tensors with the same batch size, we cannot set `dynamic_batch_size=True` if the input texts and images have different batch sizes. And as `NeuronModelForSentenceTransformers` class pads the inputs to the batch sizes (`text_batch_size` and `image_batch_size`) used during the compilation, you could use relatively larger batch sizes during the compilation for flexibility with the trade-off of compute.
147+
Since compiled models with dynamic batching enabled only accept input tensors with the same batch size, we cannot set `dynamic_batch_size=True` if the input texts and images have different batch sizes. And as `NeuronSentenceTransformers` class pads the inputs to the batch sizes (`text_batch_size` and `image_batch_size`) used during the compilation, you could use relatively larger batch sizes during the compilation for flexibility with the trade-off of compute.
158148

159149
eg. if you want to encode 3 or 4 or 5 texts and 1 image, you could set `text_batch_size = 5 = max(3, 4, 5)` and `image_batch_size = 1` during the compilation.
160150

docs/source/model_doc/modeling_auto.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ The following Neuron model classes are available for natural language processing
3333

3434
[[autodoc]] modeling.NeuronModelForFeatureExtraction
3535

36-
### NeuronModelForSentenceTransformers
36+
### NeuronSentenceTransformers
3737

38-
[[autodoc]] modeling.NeuronModelForSentenceTransformers
38+
[[autodoc]] modeling_sentence_transformers.NeuronSentenceTransformers
3939

4040
### NeuronModelForMaskedLM
4141

docs/source/model_doc/sentence_transformers/overview.mdx

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,16 +39,16 @@ optimum-cli export neuron -m sentence-transformers/clip-ViT-B-32 --sequence_leng
3939
* Example - Text embeddings
4040

4141
```python
42-
from optimum.neuron import NeuronModelForSentenceTransformers
42+
from optimum.neuron import NeuronSentenceTransformers
4343

4444
# configs for compiling model
4545
input_shapes = {
4646
"batch_size": 1,
47-
"sequence_length": 384,
47+
"sequence_length": 512,
4848
}
4949
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
5050

51-
neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
51+
neuron_model = NeuronSentenceTransformers.from_pretrained(
5252
"BAAI/bge-large-en-v1.5",
5353
export=True,
5454
**input_shapes,
@@ -63,12 +63,17 @@ neuron_model.push_to_hub(
6363
"bge_emb_neuron/", repository_id="optimum/bge-base-en-v1.5-neuronx" # Replace with your HF Hub repo id
6464
)
6565

66+
sentences_1 = ["Life is pain au chocolat", "Life is galette des rois"]
67+
sentences_2 = ["Life is eclaire au cafe", "Life is mille feuille"]
68+
embeddings_1 = neuron_model.encode(sentences_1, normalize_embeddings=True)
69+
embeddings_2 = neuron_model.encode(sentences_2, normalize_embeddings=True)
70+
similarity = neuron_model.similarity(embeddings_1, embeddings_2)
6671
```
6772

6873
* Example - Image Search
6974

7075
```python
71-
from optimum.neuron import NeuronModelForSentenceTransformers
76+
from optimum.neuron import NeuronSentenceTransformers
7277

7378
# configs for compiling model
7479
input_shapes = {
@@ -81,7 +86,7 @@ input_shapes = {
8186
}
8287
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
8388

84-
neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
89+
neuron_model = NeuronSentenceTransformers.from_pretrained(
8590
"sentence-transformers/clip-ViT-B-32",
8691
subfolder="0_CLIPModel",
8792
export=True,
@@ -98,7 +103,3 @@ neuron_model.push_to_hub(
98103
"clip_emb_neuron/", repository_id="optimum/clip_vit_emb_neuronx" # Replace with your HF Hub repo id
99104
)
100105
```
101-
102-
## NeuronModelForSentenceTransformers
103-
104-
[[autodoc]] modeling.NeuronModelForSentenceTransformers

optimum/commands/neuron/cache.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def _list_entries(self):
147147
str(entry["batch_size"]),
148148
str(entry["sequence_length"]),
149149
str(entry.get("tp_degree", entry.get("tensor_parallel_size"))),
150-
str(entry["torch_dtype"]),
150+
str(entry.get("torch_dtype", entry.get("dtype"))),
151151
str(entry["target"]),
152152
)
153153
)

optimum/neuron/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,6 @@
4343
"modeling_traced": ["NeuronTracedModel"],
4444
"modeling": [
4545
"NeuronModelForFeatureExtraction",
46-
"NeuronModelForSentenceTransformers",
4746
"NeuronModelForMaskedLM",
4847
"NeuronModelForQuestionAnswering",
4948
"NeuronModelForSequenceClassification",
@@ -78,6 +77,7 @@
7877
"modeling_seq2seq": [
7978
"NeuronModelForSeq2SeqLM",
8079
],
80+
"modeling_sentence_transformers": ["NeuronSentenceTransformers"],
8181
"models": [],
8282
"accelerate": [
8383
"NeuronAccelerator",
@@ -115,7 +115,6 @@
115115
NeuronModelForObjectDetection,
116116
NeuronModelForQuestionAnswering,
117117
NeuronModelForSemanticSegmentation,
118-
NeuronModelForSentenceTransformers,
119118
NeuronModelForSequenceClassification,
120119
NeuronModelForTokenClassification,
121120
NeuronModelForXVector,
@@ -138,6 +137,7 @@
138137
NeuronStableDiffusionXLInpaintPipeline,
139138
NeuronStableDiffusionXLPipeline,
140139
)
140+
from .modeling_sentence_transformers import NeuronSentenceTransformers
141141
from .modeling_seq2seq import NeuronModelForSeq2SeqLM
142142
from .modeling_traced import NeuronTracedModel
143143

optimum/neuron/cache/hub_cache.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ def select_hub_cached_entries(
427427
continue
428428
if torch_dtype is not None:
429429
target_value = DTYPE_MAPPER.pt(torch_dtype) if isinstance(torch_dtype, str) else torch_dtype
430-
entry_value = DTYPE_MAPPER.pt(entry.get("torch_dtype"))
430+
entry_value = DTYPE_MAPPER.pt(entry.get("torch_dtype", entry.get("dtype")))
431431
if target_value != entry_value:
432432
continue
433433
selected.append(entry)

optimum/neuron/modeling.py

Lines changed: 0 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515
"""NeuronModelForXXX classes for inference on neuron devices using the same API as Transformers."""
1616

1717
import logging
18-
from typing import TYPE_CHECKING
1918

2019
import torch
2120
from transformers import (
@@ -66,8 +65,6 @@
6665
NEURON_OBJECT_DETECTION_EXAMPLE,
6766
NEURON_QUESTION_ANSWERING_EXAMPLE,
6867
NEURON_SEMANTIC_SEGMENTATION_EXAMPLE,
69-
NEURON_SENTENCE_TRANSFORMERS_IMAGE_EXAMPLE,
70-
NEURON_SENTENCE_TRANSFORMERS_TEXT_EXAMPLE,
7168
NEURON_SEQUENCE_CLASSIFICATION_EXAMPLE,
7269
NEURON_TEXT_INPUTS_DOCSTRING,
7370
NEURON_TOKEN_CLASSIFICATION_EXAMPLE,
@@ -76,10 +73,6 @@
7673
)
7774

7875

79-
if TYPE_CHECKING:
80-
pass
81-
82-
8376
logger = logging.getLogger(__name__)
8477

8578

@@ -135,72 +128,6 @@ def forward(
135128
return BaseModelOutputWithPooling(last_hidden_state=last_hidden_state, pooler_output=pooler_output)
136129

137130

138-
@add_start_docstrings(
139-
"""
140-
Neuron Model for Sentence Transformers.
141-
""",
142-
NEURON_MODEL_START_DOCSTRING,
143-
)
144-
class NeuronModelForSentenceTransformers(NeuronTracedModel):
145-
"""
146-
Sentence Transformers model on Neuron devices.
147-
"""
148-
149-
auto_model_class = AutoModel
150-
library_name = "sentence_transformers"
151-
152-
@add_start_docstrings_to_model_forward(
153-
NEURON_TEXT_INPUTS_DOCSTRING.format("batch_size, sequence_length")
154-
+ NEURON_SENTENCE_TRANSFORMERS_TEXT_EXAMPLE.format(
155-
processor_class=_TOKENIZER_FOR_DOC,
156-
model_class="NeuronModelForSentenceTransformers",
157-
checkpoint="optimum/bge-base-en-v1.5-neuronx",
158-
)
159-
+ NEURON_SENTENCE_TRANSFORMERS_IMAGE_EXAMPLE.format(
160-
processor_class=_GENERIC_PROCESSOR,
161-
model_class="NeuronModelForSentenceTransformers",
162-
checkpoint="optimum/clip_vit_emb_neuronx",
163-
)
164-
)
165-
def forward(
166-
self,
167-
input_ids: torch.Tensor,
168-
attention_mask: torch.Tensor,
169-
pixel_values: torch.Tensor | None = None,
170-
token_type_ids: torch.Tensor | None = None,
171-
**kwargs,
172-
):
173-
model_type = self.config.neuron["model_type"]
174-
neuron_inputs = {"input_ids": input_ids}
175-
if pixel_values is not None:
176-
neuron_inputs["pixel_values"] = pixel_values
177-
neuron_inputs["attention_mask"] = (
178-
attention_mask # The input order for clip is: input_ids, pixel_values, attention_mask.
179-
)
180-
181-
with self.neuron_padding_manager(neuron_inputs) as inputs:
182-
outputs = self.model(*inputs)
183-
if "clip" in model_type:
184-
text_embeds = self.remove_padding([outputs[0]], dims=[0], indices=[input_ids.shape[0]])[
185-
0
186-
] # Remove padding on batch_size(0)
187-
image_embeds = self.remove_padding([outputs[1]], dims=[0], indices=[pixel_values.shape[0]])[
188-
0
189-
] # Remove padding on batch_size(0)
190-
return ModelOutput(text_embeds=text_embeds, image_embeds=image_embeds)
191-
else:
192-
# token_embeddings -> (batch_size, sequencen_len, hidden_size)
193-
token_embeddings = self.remove_padding(
194-
[outputs[0]], dims=[0, 1], indices=[input_ids.shape[0], input_ids.shape[1]]
195-
)[0] # Remove padding on batch_size(0), and sequence_length(1)
196-
# sentence_embedding -> (batch_size, hidden_size)
197-
sentence_embedding = self.remove_padding([outputs[1]], dims=[0], indices=[input_ids.shape[0]])[
198-
0
199-
] # Remove padding on batch_size(0)
200-
201-
return ModelOutput(token_embeddings=token_embeddings, sentence_embedding=sentence_embedding)
202-
203-
204131
@add_start_docstrings(
205132
"""
206133
Neuron Model with a MaskedLMOutput for masked language modeling tasks.

0 commit comments

Comments
 (0)