All notable changes to this project will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- support for Qwen3 models.
- better dependency management and setup configuration.
- model evaluation method.
- support for MOE model architectures.
- improved fastapi-server.
- improved bge embedding model serving method.
- fixed some issues in fastapi-server and langchain pipeline.
- synchronized with mlx==0.23.0 and mlx-lm==0.21.4
- created async_generate_step in fast-api
- added token usage information in fast-api
- extended libra router types
- improved fastAPI server
- support libra confidence router
- improved the hidden states generation method
- project structure refactoring
- langchain integration
- local_rag and graph_rag example
- generate method to support hidden states output
- model management, FastAPI-server
- unit test
- synchronized with the mlx-lm
- simplified README
- updated mlx_fastchat_worker for supporting mlx >= 0.14.
- updated conda config.
- Lora support for GBA low-bit models.
- support for Phi-3
- Conversion: Utilize gba2mlx.py to convert models from GBA format to a format compatible with the MLX framework, ensuring smooth integration and optimal performance.
- Generation: Includes scripts for generating content using GBA quantized models within the MLX environment, empowering users to leverage the advanced capabilities of GBA models for natural language content creation.
- Fully support GreenBitAI's MLX Model Collection
- Initial commit