OpenVINO™ Model Server 2024.3
·
5 commits
to releases/2024/3
since this release
The 2024.3 release focus mostly on improvements in OpenAI API text generation implementation.
Changes and improvements
A set of improvements in OpenAI API text generation:
- Significantly better performance thanks to numerous improvements in OpenVINO Runtime and sampling algorithms
- Added config parameters
best_of_limitandmax_tokens_limitto avoid memory overconsumption impact from invalid requests Read more - Added reporting LLM metrics in the server logs Read more
- Added extra sampling parameters
diversity_penalty,length_penalty,repetition_penalty. Read more
Improvements in documentation and demos:
- Added RAG demo with OpenAI API
- Added K8S deployment demo for text generation scenarios
- Simplified models initialization for a set of demos with mediapipe graphs using pose_detection model. TFLite models don't required any conversions Check demo
Breaking changes
No breaking changes.
Bug fixes
- Resolved issue with sporadic text generation hang via OpenAI API endpoints
- Fixed issue with chat streamer impacting incomplete utf-8 sequences
- Corrected format of the last streaming event in
completionsendpoint - Fixed issue with request hanging when running out of available cache
You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.3 - CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.3-gpu - GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog