Notable Changes
- Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs
- NomicBert MoE support
- JinaAI Re-Rankers V1 support
- Matryoshka Representation Learning (MRL)
- Dense layer module support (after pooling)
Note
Some of the aforementioned changes were released within the patch versions on top of v1.7.0, whilst both Matryoshka Representation Learning (MRL) and Dense layer module support have been recently included and were not released yet.
What's Changed
- [Docs] Update quick tour by @NielsRogge in #574
- Update
README.md
andsupported_models.md
by @alvarobartt in #572 - Back with linting. by @Narsil in #577
- [Docs] Add cloud run example by @NielsRogge in #573
- Fixup by @Narsil in #578
- Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in #576
- Removing requirements file. by @Narsil in #585
- Removing candle-extensions to live on crates.io by @Narsil in #583
- Bump
sccache
to 0.10.0 andsccache-action
to 0.0.9 by @alvarobartt in #586 - optimize the performance of FlashBert Path for HPU by @kaixuanliu in #575
- Revert "Removing requirements file. (#585)" by @Narsil in #588
- Get opentelemetry trace id from request headers by @kozistr in #425
- Add argument for configuring Prometheus port by @kozistr in #589
- Adding missing
head.
prefix in the weight name inModernBertClassificationHead
by @kozistr in #591 - Fixing the CI (grpc path). by @Narsil in #593
- fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in #595
- enable flash mistral model for HPU device by @kaixuanliu in #594
- remove optimum-habana dependency by @kaixuanliu in #599
- Support NomicBert MoE by @kozistr in #596
- Remove duplicate short option '-p' to fix router executable by @cebtenzzre in #602
- Update
text-embeddings-router --help
output by @alvarobartt in #603 - Warmup padded models too. by @Narsil in #592
- Add support for JinaAI Re-Rankers V1 by @alvarobartt in #582
- Gte diffs by @Narsil in #604
- Fix the weight name in GTEClassificationHead by @kozistr in #606
- upgrade pytorch and ipex to 2.7 version by @kaixuanliu in #607
- upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in #608
- Patch DistilBERT variants with different weight keys by @alvarobartt in #614
- add offline modeling for model
jinaai/jina-embeddings-v2-base-code
to avoidauto_map
to other repository by @kaixuanliu in #612 - Add mean pooling strategy for Modernbert classifier by @kwnath in #616
- Using serde for pool validation. by @Narsil in #620
- Preparing the update to 1.7.1 by @Narsil in #623
- Adding suggestions to fixing missing ONNX files. by @Narsil in #624
- Add
Qwen3Model
by @alvarobartt in #627 - Add
HiddenAct::Silu
(removeserde
alias) by @alvarobartt in #631 - Add CPU support for Qwen3-Embedding models by @randomm in #632
- refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in #625
- Support Qwen3 w/ fp32 on GPU by @kozistr in #634
- Preparing the release. by @Narsil in #639
- Default to Qwen3 in
README.md
anddocs/
examples by @alvarobartt in #641 - Fix Qwen3 by @kozistr in #646
- Add integration tests for Gaudi by @baptistecolle in #598
- Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in #648
- Fix FlashQwen3 by @kozistr in #650
- Make flake work on metal by @Narsil in #654
- Fixing metal backend. by @Narsil in #655
- Qwen3 hpu support by @kaixuanliu in #656
- change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #659
- Update
version
to 1.7.3 by @alvarobartt in #666 - Add last token pooling support for ORT. by @tpendragon in #664
- Fix Qwen3 Embedding Float16 DType by @tpendragon in #663
- Fix
fmt
by re-runningpre-commit
by @alvarobartt in #671 - Update
version
to 1.7.4 by @alvarobartt in #677 - Support MRL (Matryoshka Representation Learning) by @kozistr in #676
- Add
Dense
layer for2_Dense/
modules by @alvarobartt in #660 - Update
version
to 1.8.0 by @alvarobartt in #686
New Contributors
- @NielsRogge made their first contribution in #574
- @cebtenzzre made their first contribution in #602
- @kwnath made their first contribution in #616
- @randomm made their first contribution in #632
- @lance-miles made their first contribution in #648
- @tpendragon made their first contribution in #664
Full Changelog: v1.7.0...v1.8.0