Releases · huggingface/text-embeddings-inference

05 Aug 08:31

alvarobartt

v1.8.0

2bff275

v1.8.0 Latest

Latest

Notable Changes

Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs
NomicBert MoE support
JinaAI Re-Rankers V1 support
Matryoshka Representation Learning (MRL)
Dense layer module support (after pooling)

Note

Some of the aforementioned changes were released within the patch versions on top of v1.7.0, whilst both Matryoshka Representation Learning (MRL) and Dense layer module support have been recently included and were not released yet.

What's Changed

[Docs] Update quick tour by @NielsRogge in #574
Update README.md and supported_models.md by @alvarobartt in #572
Back with linting. by @Narsil in #577
[Docs] Add cloud run example by @NielsRogge in #573
Fixup by @Narsil in #578
Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in #576
Removing requirements file. by @Narsil in #585
Removing candle-extensions to live on crates.io by @Narsil in #583
Bump sccache to 0.10.0 and sccache-action to 0.0.9 by @alvarobartt in #586
optimize the performance of FlashBert Path for HPU by @kaixuanliu in #575
Revert "Removing requirements file. (#585)" by @Narsil in #588
Get opentelemetry trace id from request headers by @kozistr in #425
Add argument for configuring Prometheus port by @kozistr in #589
Adding missing head. prefix in the weight name in ModernBertClassificationHead by @kozistr in #591
Fixing the CI (grpc path). by @Narsil in #593
fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in #595
enable flash mistral model for HPU device by @kaixuanliu in #594
remove optimum-habana dependency by @kaixuanliu in #599
Support NomicBert MoE by @kozistr in #596
Remove duplicate short option '-p' to fix router executable by @cebtenzzre in #602
Update text-embeddings-router --help output by @alvarobartt in #603
Warmup padded models too. by @Narsil in #592
Add support for JinaAI Re-Rankers V1 by @alvarobartt in #582
Gte diffs by @Narsil in #604
Fix the weight name in GTEClassificationHead by @kozistr in #606
upgrade pytorch and ipex to 2.7 version by @kaixuanliu in #607
upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in #608
Patch DistilBERT variants with different weight keys by @alvarobartt in #614
add offline modeling for model jinaai/jina-embeddings-v2-base-code to avoid auto_map to other repository by @kaixuanliu in #612
Add mean pooling strategy for Modernbert classifier by @kwnath in #616
Using serde for pool validation. by @Narsil in #620
Preparing the update to 1.7.1 by @Narsil in #623
Adding suggestions to fixing missing ONNX files. by @Narsil in #624
Add Qwen3Model by @alvarobartt in #627
Add HiddenAct::Silu (remove serde alias) by @alvarobartt in #631
Add CPU support for Qwen3-Embedding models by @randomm in #632
refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in #625
Support Qwen3 w/ fp32 on GPU by @kozistr in #634
Preparing the release. by @Narsil in #639
Default to Qwen3 in README.md and docs/ examples by @alvarobartt in #641
Fix Qwen3 by @kozistr in #646
Add integration tests for Gaudi by @baptistecolle in #598
Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in #648
Fix FlashQwen3 by @kozistr in #650
Make flake work on metal by @Narsil in #654
Fixing metal backend. by @Narsil in #655
Qwen3 hpu support by @kaixuanliu in #656
change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #659
Update version to 1.7.3 by @alvarobartt in #666
Add last token pooling support for ORT. by @tpendragon in #664
Fix Qwen3 Embedding Float16 DType by @tpendragon in #663
Fix fmt by re-running pre-commit by @alvarobartt in #671
Update version to 1.7.4 by @alvarobartt in #677
Support MRL (Matryoshka Representation Learning) by @kozistr in #676
Add Dense layer for 2_Dense/ modules by @alvarobartt in #660
Update version to 1.8.0 by @alvarobartt in #686

New Contributors

@NielsRogge made their first contribution in #574
@cebtenzzre made their first contribution in #602
@kwnath made their first contribution in #616
@randomm made their first contribution in #632
@lance-miles made their first contribution in #648
@tpendragon made their first contribution in #664

Full Changelog: v1.7.0...v1.8.0

Contributors

Narsil, randomm, and 9 other contributors

Assets 2

07 Jul 12:33

alvarobartt

v1.7.4

6e900af

v1.7.4

Noticeable Changes

Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to null values, as well as a missing to_dtype call on the attention_bias when working with batches.

What's Changed

Fix Qwen3 Embedding Float16 DType by @tpendragon in #663
Fix fmt by re-running pre-commit by @alvarobartt in #671
Update version to 1.7.4 by @alvarobartt in #677

Full Changelog: v1.7.3...v1.7.4

Contributors

tpendragon and alvarobartt

Assets 2

30 Jun 10:54

alvarobartt

v1.7.3

fb80177

v1.7.3

Noticeable Changes

Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.

What's Changed

Default to Qwen3 in README.md and docs/ examples by @alvarobartt in #641
Fix Qwen3 by @kozistr in #646
Add integration tests for Gaudi by @baptistecolle in #598
Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in #648
Fix FlashQwen3 by @kozistr in #650
Make flake work on metal by @Narsil in #654
Fixing metal backend. by @Narsil in #655
Qwen3 hpu support by @kaixuanliu in #656
change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #659
Update version to 1.7.3 by @alvarobartt in #666
Add last token pooling support for ORT. by @tpendragon in #664

New Contributors

@lance-miles made their first contribution in #648
@tpendragon made their first contribution in #664

Full Changelog: v1.7.2...v1.7.3

Contributors

Narsil, tpendragon, and 5 other contributors

Assets 2

16 Jun 06:44

Narsil

v1.7.2

a69cc2e

v1.7.2

Notable change

Added support for Qwen3 embeddigns

What's Changed

Adding suggestions to fixing missing ONNX files. by @Narsil in #624
Add Qwen3Model by @alvarobartt in #627
Add HiddenAct::Silu (remove serde alias) by @alvarobartt in #631
Add CPU support for Qwen3-Embedding models by @randomm in #632
refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in #625
Support Qwen3 w/ fp32 on GPU by @kozistr in #634
Preparing the release. by @Narsil in #639

New Contributors

@randomm made their first contribution in #632

Full Changelog: v1.7.1...v1.7.2

Contributors

Narsil, randomm, and 3 other contributors

Assets 2

03 Jun 13:38

Narsil

v1.7.1

006e16b

v1.7.1

What's Changed

[Docs] Update quick tour by @NielsRogge in #574
Update README.md and supported_models.md by @alvarobartt in #572
Back with linting. by @Narsil in #577
[Docs] Add cloud run example by @NielsRogge in #573
Fixup by @Narsil in #578
Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in #576
Removing requirements file. by @Narsil in #585
Removing candle-extensions to live on crates.io by @Narsil in #583
Bump sccache to 0.10.0 and sccache-action to 0.0.9 by @alvarobartt in #586
optimize the performance of FlashBert Path for HPU by @kaixuanliu in #575
Revert "Removing requirements file. (#585)" by @Narsil in #588
Get opentelemetry trace id from request headers by @kozistr in #425
Add argument for configuring Prometheus port by @kozistr in #589
Adding missing head. prefix in the weight name in ModernBertClassificationHead by @kozistr in #591
Fixing the CI (grpc path). by @Narsil in #593
fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in #595
enable flash mistral model for HPU device by @kaixuanliu in #594
remove optimum-habana dependency by @kaixuanliu in #599
Support NomicBert MoE by @kozistr in #596
Remove duplicate short option '-p' to fix router executable by @cebtenzzre in #602
Update text-embeddings-router --help output by @alvarobartt in #603
Warmup padded models too. by @Narsil in #592
Add support for JinaAI Re-Rankers V1 by @alvarobartt in #582
Gte diffs by @Narsil in #604
Fix the weight name in GTEClassificationHead by @kozistr in #606
upgrade pytorch and ipex to 2.7 version by @kaixuanliu in #607
upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in #608
Patch DistilBERT variants with different weight keys by @alvarobartt in #614
add offline modeling for model jinaai/jina-embeddings-v2-base-code to avoid auto_map to other repository by @kaixuanliu in #612
Add mean pooling strategy for Modernbert classifier by @kwnath in #616
Using serde for pool validation. by @Narsil in #620
Preparing the update to 1.7.1 by @Narsil in #623

New Contributors

@NielsRogge made their first contribution in #574
@cebtenzzre made their first contribution in #602
@kwnath made their first contribution in #616

Full Changelog: v1.7.0...v1.7.1

Contributors

Narsil, kaixuanliu, and 5 other contributors

Assets 2

08 Apr 11:54

Narsil

v1.7.0

72dac20

v1.7.0

Notable changes

Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)
Added ModernBert support by @kozistr !

What's Changed

Moving cublaslt into TEI extension for easier upgrade of candle globally by @Narsil in #542
Upgrade candle2 by @Narsil in #543
Upgrade candle3 by @Narsil in #545
Fixing the static-linking. by @Narsil in #547
Fix linking bis by @Narsil in #549
Make sliding_window for Qwen2 optional by @alvarobartt in #546
Optimize the performance of FlashBert on HPU by using fast mode softmax by @kaixuanliu in #555
Fixing cudarc to the latest unified bindings. by @Narsil in #558
Fix typos / formatting in CLI args in Markdown files by @alvarobartt in #552
Use custom serde deserializer for JinaBERT models by @alvarobartt in #559
Implement the ModernBert model by @kozistr in #459
Fixing FlashAttention ModernBert. by @Narsil in #560
Enable ModernBert on metal by @ivarflakstad in #562
Fix {Bert,DistilBert}SpladeHead when loading from Safetensors by @alvarobartt in #564
add related docs for intel cpu/xpu/hpu container by @kaixuanliu in #550
Update the doc for submodule. by @Narsil in #567
Update docs/source/en/custom_container.md by @alvarobartt in #568
Preparing for release 1.7.0 (candle update + modernbert). by @Narsil in #570

New Contributors

@ivarflakstad made their first contribution in #562

Full Changelog: v1.6.1...v1.7.0

Contributors

Narsil, kaixuanliu, and 3 other contributors

Assets 2

28 Mar 08:47

Narsil

v1.6.1

875239e

v1.6.1

What's Changed

Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in #245
add reranker model support for python backend by @kaixuanliu in #386
(FIX): CI Security Fix - branchname injection by @glegendre01 in #479
Upgrade TEI. by @Narsil in #501
Pin cargo-chef installation to 0.1.62 by @alvarobartt in #469
add TRUST_REMOTE_CODE param to python backend. by @kaixuanliu in #485
Enable splade embeddings for Python backend by @pi314ever in #493
Hpu bucketing by @kaixuanliu in #489
Optimize flash bert path for hpu device by @kaixuanliu in #509
upgrade ipex to 2.6 version for cpu/xpu by @kaixuanliu in #510
fix bug for MaskedLanguageModel class` by @kaixuanliu in #513
Fix double incrementing te_request_count metric by @kozistr in #486
Add intel based images to the CI by @baptistecolle in #518
Fix typo on intel docker image by @baptistecolle in #529
chore: Upgrade to tokenizers 0.21.0 by @lightsofapollo in #512
feat: add support for "model_type": "gte" by @anton-pt in #519
Update README.md to include ONNX by @alvarobartt in #507
Fusing both Gte Configs. by @Narsil in #530
Add HF_HUB_USER_AGENT_ORIGIN by @alvarobartt in #534
Use --hf-token instead of --hf-api-token by @alvarobartt in #535
Fixing the tests. by @Narsil in #531
Support classification head for DistilBERT by @kozistr in #487
add CLI flag disable-spans to toggle span trace logging by @obloomfield in #481
feat: support HF_ENDPOINT environment when downloading model by @StrayDragon in #505
Small fixup. by @Narsil in #537
Fix VarBuilder handling in GTE e.g. gte-multilingual-reranker-base by @Narsil in #538
make a WA in case Bert model do not have safetensor file by @kaixuanliu in #515
Add missing match on onnx/model.onnx download by @alvarobartt in #472
Fixing the impure flake devShell to be able to run python code. by @Narsil in #539
Prepare for release. by @Narsil in #540

New Contributors

@yuanwu2017 made their first contribution in #245
@kaixuanliu made their first contribution in #386
@Narsil made their first contribution in #501
@pi314ever made their first contribution in #493
@baptistecolle made their first contribution in #518
@lightsofapollo made their first contribution in #512
@anton-pt made their first contribution in #519
@obloomfield made their first contribution in #481
@StrayDragon made their first contribution in #505

Full Changelog: v1.6.0...v1.6.1

Contributors

Narsil, lightsofapollo, and 10 other contributors

Assets 2

13 Dec 15:52

OlivierDehaene

v1.6.0

57d8fc8

v1.6.0

What's Changed

feat: support multiple backends at the same time by @OlivierDehaene in #440
feat: GTE classification head by @kozistr in #441
feat: Implement GTE model to support the non-flash-attn version by @kozistr in #446
feat: Implement MPNet model (#363) by @kozistr in #447

Full Changelog: v1.5.1...v1.6.0

Contributors

kozistr and OlivierDehaene

Assets 2

05 Nov 15:17

OlivierDehaene

v1.5.1

76b29f1

v1.5.1

What's Changed

Download model.onnx_data by @kozistr in #343
Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in #342
fix: add serde default for truncation direction by @drbh in #399
fix: metrics unbounded memory by @OlivierDehaene in #409
Fix to allow health check w/o auth by @kozistr in #360
Update ort crate version to 2.0.0-rc.4 to support onnx IR version 10 by @kozistr in #361
adds curl to fix healthcheck by @WissamAntoun in #376
fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in #410
fix: use status code 400 when batch is empty by @OlivierDehaene in #413
fix: add cls pooling as default for BERT variants by @OlivierDehaene in #426
feat: auto limit string if truncate is set by @OlivierDehaene in #428

New Contributors

@Wauplin made their first contribution in #342
@XciD made their first contribution in #345
@WissamAntoun made their first contribution in #376

Full Changelog: v1.5.0...v1.5.1

Contributors

XciD, drbh, and 4 other contributors

Assets 2

10 Jul 15:34

OlivierDehaene

v1.5.0

661a77f

v1.5.0

Notable Changes

ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
Add /similarity route

What's Changed

tokenizer max limit on input size by @ErikKaum in #324
docs: air-gapped deployments by @OlivierDehaene in #326
feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
feat: add /similarity route by @OlivierDehaene in #331
fix(ort): fix mean pooling by @OlivierDehaene in #332
chore(candle): update flash attn by @OlivierDehaene in #335
v1.5.0 by @OlivierDehaene in #336

New Contributors

@ErikKaum made their first contribution in #324

Full Changelog: v1.4.0...v1.5.0

Contributors

OlivierDehaene and ErikKaum

Assets 2

Releases: huggingface/text-embeddings-inference

v1.8.0

Notable Changes

What's Changed

New Contributors

Contributors

Uh oh!

v1.7.4

Noticeable Changes

What's Changed

Contributors

Uh oh!

v1.7.3

Noticeable Changes

What's Changed

New Contributors

Contributors

Uh oh!

v1.7.2

Notable change

What's Changed

New Contributors

Contributors

Uh oh!

v1.7.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.7.0

Notable changes

What's Changed

New Contributors

Contributors

Uh oh!

v1.6.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.6.0

What's Changed

Contributors

Uh oh!

v1.5.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.5.0

Notable Changes

What's Changed

New Contributors

Contributors

Uh oh!