Releases · huggingface/text-embeddings-inference · GitHub

15 Nov 18:20

v0.4.0

What's Changed

feat: USE_FLASH_ATTENTION env var by @OlivierDehaene in #57
docs: The initial version of the TEI docs for the hf.co/docs/ by @MKhalusova in #60
feat: support roberta by @kozistr in #62
fix: GH workflows update: added --not_python_module flag by @MKhalusova in #66
docs: Images links updated by @MKhalusova in #72
feat: add normalize option by @OlivierDehaene in #70
ci: Migrate CI to new Runners by @glegendre01 in #74
feat: add support for classification models by @OlivierDehaene in #76

New Contributors

@MKhalusova made their first contribution in #60
@kozistr made their first contribution in #62
@glegendre01 made their first contribution in #74

Full Changelog: v0.3.0...v0.4.0

Contributors

MKhalusova, kozistr, and 2 other contributors

Assets 2

27 Oct 12:46

v0.3.0

What's Changed

feat: faster CPU image on AMD in #35
feat: support camembert in #42
feat: support float32 on cuda in #41
feat: support jinaAI variant in #48

Full Changelog: v0.2.2...v0.3.0

Assets 2

19 Oct 12:12

v0.2.2

What's Changed

fix: max_input_length should take into account position_offset (aec5efd)

Full Changelog: v0.2.1...v0.2.2

Assets 2

18 Oct 17:39

v0.2.1

What's Changed

fix: only use position offset for xlm-roberta (8c507c3)

Full Changelog: v0.2.0...v0.2.1

Assets 2

18 Oct 11:40

v0.2.0

What's Changed

add support for XLM-RoBERTa in #5
get number of tokenization workers from the number of CPU cores in #8
prefetch batch in #10
support loading from .pth in #12
add --pooling arg in #14
fix compute cap matching in #21

Full Changelog: v0.1.0...v0.2.0

Assets 2

13 Oct 13:46

v0.1.0

No compilation step
Dynamic shapes
Small docker images and fast boot times. Get ready for true serverless!
Token based dynamic batching
Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
Safetensors weight loading
Production ready (distributed tracing with Open Telemetry, Prometheus metrics)

Assets 2