Skip to content

Commit 014e815

Browse files
committed
Squash merge main into disable-flash-attn-if-env
1 parent 7ac2214 commit 014e815

File tree

7 files changed

+45
-38
lines changed

7 files changed

+45
-38
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,13 @@ Once you're done, someone will review your PR shortly (see the section "Who can
1414

1515
Fixes # (issue)
1616

17-
1817
## Before submitting
19-
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
20-
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
21-
Pull Request section?
22-
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
23-
to it if that's the case.
24-
- [ ] Did you make sure to update the documentation with your changes? Here are the
25-
[documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and
26-
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
27-
- [ ] Did you write any new necessary tests?
2818

19+
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
20+
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/text-embeddings-inference/blob/main/CONTRIBUTING.md)?
21+
- [ ] Was this discussed/approved via a GitHub issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case.
22+
- [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs).
23+
- [ ] Did you write any new necessary tests? If applicable, did you include or update the `insta` snapshots?
2924

3025
## Who can review?
3126

@@ -34,7 +29,6 @@ members/contributors who may be interested in your PR.
3429

3530
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
3631
37-
38-
@OlivierDehaene OR @Narsil
32+
@Narsil OR @alvarobartt
3933
4034
-->

.github/workflows/trufflehog.yml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,11 @@ jobs:
77
trufflehog:
88
runs-on: ubuntu-latest
99
steps:
10-
- name: Checkout code
11-
uses: actions/checkout@v4
12-
with:
13-
fetch-depth: 0
14-
- name: Secret Scanning
15-
uses: trufflesecurity/trufflehog@main
10+
- name: Checkout code
11+
uses: actions/checkout@v4
12+
with:
13+
fetch-depth: 0
14+
- name: Secret Scanning
15+
uses: trufflesecurity/trufflehog@main
16+
with:
17+
extra_args: --results=verified,unknown --exclude-detectors=postgres

CODE_OF_CONDUCT.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
# Contributor Covenant Code of Conduct
32

43
## Our Pledge

CONTRIBUTING.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ See the License for the specific language governing permissions and
1414
limitations under the License.
1515
-->
1616

17-
# Contribute to text-embeddings-inference
17+
# Contribute to Text Embeddings Inference (TEI)
1818

1919
Everyone is welcome to contribute, and we value everybody's contribution. Code
2020
contributions are not the only way to help the community. Answering questions, helping
@@ -31,7 +31,7 @@ However you choose to contribute, please be mindful and respect our
3131

3232
## Ways to contribute
3333

34-
There are several ways you can contribute to text-embeddings-inference.
34+
There are several ways you can contribute to Text Embeddings Inference (TEI).
3535

3636
* Fix outstanding issues with the existing code.
3737
* Submit issues related to bugs or desired new features.
@@ -52,7 +52,7 @@ feedback.
5252

5353
### Did you find a bug?
5454

55-
The text-embeddings-inference library is robust and reliable thanks to users who report the problems they encounter.
55+
The Text Embeddings Inference (TEI) solution is robust and reliable thanks to users who report the problems they encounter.
5656

5757
Before you report an issue, we would really appreciate it if you could **make sure the bug was not
5858
already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the
@@ -68,7 +68,7 @@ we can quickly resolve it:
6868

6969
### Do you want a new feature?
7070

71-
If there is a new feature you'd like to see in text-embeddings-inference, please open an issue and describe:
71+
If there is a new feature you'd like to see in Text Embeddings Inference (TEI), please open an issue and describe:
7272

7373
1. What is the *motivation* behind this feature? Is it related to a problem or frustration with the library? Is it
7474
a feature related to something you need for a project? Is it something you worked on and think it could benefit
@@ -94,7 +94,7 @@ New models are constantly released and if you want to implement a new model, ple
9494
* Link to the implementation if it is open-sourced.
9595
* Link to the model weights if they are available.
9696

97-
If you are willing to contribute the model yourself, let us know so we can help you add it to text-embeddings-inference!
97+
If you are willing to contribute the model yourself, let us know so we can help you add it to Text Embeddings Inference (TEI)!
9898

9999
## Do you want to add documentation?
100100

@@ -104,8 +104,8 @@ happy to make the changes or help you make a contribution if you're interested!
104104

105105
## I want to become a maintainer of the project. How do I get there?
106106

107-
TGI is a project led and managed by Hugging Face as it powers our internal services. However, we are happy to have
108-
motivated individuals from other organizations join us as maintainers with the goal of making TGI the best inference
109-
service.
107+
Text Embeddings Inference (TEI) is a project led and managed by Hugging Face as it powers our internal services. However, we are happy to have
108+
motivated individuals from other organizations join us as maintainers with the goal of making TEI the best inference
109+
service for embedding models on production on production.
110110

111111
If you are such an individual (or organization), please reach out to us and let's collaborate.

backends/grpc-client/src/client.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ use grpc_metadata::InjectTelemetryContext;
66
use tonic::transport::{Channel, Uri};
77
use tracing::instrument;
88

9-
/// Text Generation Inference gRPC client
9+
/// Text Embeddings Inference gRPC client
1010
#[derive(Debug, Clone)]
1111
pub struct Client {
1212
stub: EmbeddingServiceClient<Channel>,

backends/python/server/text_embeddings_server/utils/flash_attn.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import os
2-
import torch
3-
from text_embeddings_server.utils.device import use_ipex, is_hpu
42

3+
import torch
54
from loguru import logger
65

6+
from text_embeddings_server.utils.device import is_hpu, use_ipex
7+
78
if os.getenv("USE_FLASH_ATTENTION", "").lower() == "false":
89
raise ImportError("`USE_FLASH_ATTENTION` is false.")
910

@@ -30,7 +31,7 @@
3031
except ImportError:
3132
raise ImportError(
3233
"Flash Attention V2 is not installed.\n"
33-
"Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) "
34+
"Use the official Docker image (ghcr.io/huggingface/text-embeddings-inference:cuda-latest) "
3435
"or install flash attention v2 with `cd server && make install install-flash-attention-v2`"
3536
)
3637
if not (is_sm8x or is_sm90):
@@ -45,7 +46,7 @@
4546
except ImportError:
4647
raise ImportError(
4748
"Flash Attention is not installed.\n"
48-
"Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) "
49+
"Use the official Docker image (ghcr.io/huggingface/text-embeddings-inference:cuda-latest) "
4950
"or install flash attention with `cd server && make install install-flash-attention`"
5051
) from e
5152

backends/src/lib.rs

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ impl Backend {
168168
}
169169
}
170170
for shape in shapes.iter() {
171-
let batch = self.create_warmup_batch(*shape, max_token as u32);
171+
let batch = self.create_warmup_batch(*shape, max_token as u32, seq_bucket_size as u32);
172172
match &self.model_type {
173173
ModelType::Classifier => self.predict(batch).await.map(|_| ()),
174174
ModelType::Embedding(_) => self.embed(batch).await.map(|_| ()),
@@ -179,19 +179,30 @@ impl Backend {
179179
}
180180

181181
#[instrument(skip_all)]
182-
pub fn create_warmup_batch(&self, shape: (u32, u32), max_token: u32) -> Batch {
182+
pub fn create_warmup_batch(
183+
&self,
184+
shape: (u32, u32),
185+
max_token: u32,
186+
seq_bucket_size: u32,
187+
) -> Batch {
183188
let (batch_size, length) = shape;
189+
let min_length = length.saturating_sub(seq_bucket_size).saturating_add(1);
190+
let tmp_length = if min_length < length {
191+
rand::rng().random_range(min_length..length)
192+
} else {
193+
length
194+
};
184195
let mut batched_input_ids = Vec::new();
185196
let mut batched_token_type_ids = Vec::new();
186197
let mut batched_position_ids = Vec::new();
187198
let mut cumulative_seq_lengths = Vec::with_capacity(batch_size as usize + 1);
188199
let mut pooled_indices = Vec::with_capacity(batch_size as usize);
189200
cumulative_seq_lengths.push(0);
190-
let input_ids: Vec<u32> = (0..length)
201+
let input_ids: Vec<u32> = (0..tmp_length)
191202
.map(|_| rand::rng().random_range(0..max_token))
192203
.collect();
193-
let token_type_ids: Vec<u32> = vec![0; length as usize];
194-
let position_ids: Vec<u32> = (0..length).collect();
204+
let token_type_ids: Vec<u32> = vec![0; tmp_length as usize];
205+
let position_ids: Vec<u32> = (0..tmp_length).collect();
195206
let mut current_length = 0;
196207
for batch_id in 0..batch_size {
197208
batched_input_ids.extend(input_ids.iter().cloned());
@@ -206,7 +217,7 @@ impl Backend {
206217
token_type_ids: batched_token_type_ids,
207218
position_ids: batched_position_ids,
208219
cumulative_seq_lengths,
209-
max_length: length,
220+
max_length: tmp_length,
210221
pooled_indices,
211222
raw_indices: vec![],
212223
}

0 commit comments

Comments
 (0)