Release v1.8.0 · huggingface/text-embeddings-inference

Notable Changes

Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs
NomicBert MoE support
JinaAI Re-Rankers V1 support
Matryoshka Representation Learning (MRL)
Dense layer module support (after pooling)

Note

Some of the aforementioned changes were released within the patch versions on top of v1.7.0, whilst both Matryoshka Representation Learning (MRL) and Dense layer module support have been recently included and were not released yet.

What's Changed

[Docs] Update quick tour by @NielsRogge in #574
Update README.md and supported_models.md by @alvarobartt in #572
Back with linting. by @Narsil in #577
[Docs] Add cloud run example by @NielsRogge in #573
Fixup by @Narsil in #578
Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in #576
Removing requirements file. by @Narsil in #585
Removing candle-extensions to live on crates.io by @Narsil in #583
Bump sccache to 0.10.0 and sccache-action to 0.0.9 by @alvarobartt in #586
optimize the performance of FlashBert Path for HPU by @kaixuanliu in #575
Revert "Removing requirements file. (#585)" by @Narsil in #588
Get opentelemetry trace id from request headers by @kozistr in #425
Add argument for configuring Prometheus port by @kozistr in #589
Adding missing head. prefix in the weight name in ModernBertClassificationHead by @kozistr in #591
Fixing the CI (grpc path). by @Narsil in #593
fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in #595
enable flash mistral model for HPU device by @kaixuanliu in #594
remove optimum-habana dependency by @kaixuanliu in #599
Support NomicBert MoE by @kozistr in #596
Remove duplicate short option '-p' to fix router executable by @cebtenzzre in #602
Update text-embeddings-router --help output by @alvarobartt in #603
Warmup padded models too. by @Narsil in #592
Add support for JinaAI Re-Rankers V1 by @alvarobartt in #582
Gte diffs by @Narsil in #604
Fix the weight name in GTEClassificationHead by @kozistr in #606
upgrade pytorch and ipex to 2.7 version by @kaixuanliu in #607
upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in #608
Patch DistilBERT variants with different weight keys by @alvarobartt in #614
add offline modeling for model jinaai/jina-embeddings-v2-base-code to avoid auto_map to other repository by @kaixuanliu in #612
Add mean pooling strategy for Modernbert classifier by @kwnath in #616
Using serde for pool validation. by @Narsil in #620
Preparing the update to 1.7.1 by @Narsil in #623
Adding suggestions to fixing missing ONNX files. by @Narsil in #624
Add Qwen3Model by @alvarobartt in #627
Add HiddenAct::Silu (remove serde alias) by @alvarobartt in #631
Add CPU support for Qwen3-Embedding models by @randomm in #632
refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in #625
Support Qwen3 w/ fp32 on GPU by @kozistr in #634
Preparing the release. by @Narsil in #639
Default to Qwen3 in README.md and docs/ examples by @alvarobartt in #641
Fix Qwen3 by @kozistr in #646
Add integration tests for Gaudi by @baptistecolle in #598
Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in #648
Fix FlashQwen3 by @kozistr in #650
Make flake work on metal by @Narsil in #654
Fixing metal backend. by @Narsil in #655
Qwen3 hpu support by @kaixuanliu in #656
change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #659
Update version to 1.7.3 by @alvarobartt in #666
Add last token pooling support for ORT. by @tpendragon in #664
Fix Qwen3 Embedding Float16 DType by @tpendragon in #663
Fix fmt by re-running pre-commit by @alvarobartt in #671
Update version to 1.7.4 by @alvarobartt in #677
Support MRL (Matryoshka Representation Learning) by @kozistr in #676
Add Dense layer for 2_Dense/ modules by @alvarobartt in #660
Update version to 1.8.0 by @alvarobartt in #686

New Contributors

@NielsRogge made their first contribution in #574
@cebtenzzre made their first contribution in #602
@kwnath made their first contribution in #616
@randomm made their first contribution in #632
@lance-miles made their first contribution in #648
@tpendragon made their first contribution in #664

Full Changelog: v1.7.0...v1.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.8.0

Notable Changes

What's Changed

New Contributors

Contributors

Uh oh!