You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Improve vLLM startup time to support faster switching between models.
56
+
*[v] Improve vLLM startup time to support faster switching between models.
57
57
* MultiGPU support: Enable multiple models running at the same time on different GPUs.
58
58
* Enable multiple models running at the same time on the same GPU, this means we need to be able to estimate the vRAM usage of each model and manage the memory accordingly.
59
59
* Add support for ROCm, Apple Silicon, and other architectures.
60
60
* Add support for loading adapter layers.
61
-
* Add support for endpoints other than chatcompletion, such as embeddings and text generation.
61
+
* Add support for endpoints other than chat/completion, such as embeddings and text generation.
62
62
63
63
## Contributing
64
64
@@ -78,4 +78,4 @@ HoML stands on the shoulders of giants. We would like to extend our heartfelt gr
78
78
79
79
## License
80
80
81
-
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.
81
+
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.
0 commit comments