Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v0.4.0
What's Changed
- feat: USE_FLASH_ATTENTION env var by @OlivierDehaene in #57
- docs: The initial version of the TEI docs for the hf.co/docs/ by @MKhalusova in #60
- feat: support roberta by @kozistr in #62
- fix: GH workflows update: added --not_python_module flag by @MKhalusova in #66
- docs: Images links updated by @MKhalusova in #72
- feat: add
normalize
option by @OlivierDehaene in #70 - ci: Migrate CI to new Runners by @glegendre01 in #74
- feat: add support for classification models by @OlivierDehaene in #76
New Contributors
- @MKhalusova made their first contribution in #60
- @kozistr made their first contribution in #62
- @glegendre01 made their first contribution in #74
Full Changelog: v0.3.0...v0.4.0
v0.3.0
v0.2.2
What's Changed
fix: max_input_length should take into account position_offset (aec5efd)
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- fix: only use position offset for xlm-roberta (8c507c3)
Full Changelog: v0.2.0...v0.2.1
v0.2.0
v0.1.0
- No compilation step
- Dynamic shapes
- Small docker images and fast boot times. Get ready for true serverless!
- Token based dynamic batching
- Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
- Safetensors weight loading
- Production ready (distributed tracing with Open Telemetry, Prometheus metrics)