as [tech report](https://github.com/OpenBMB/MiniCPM/blob/main/report/MiniCPM_4_Technical_Report.pdf) mentioned, it is really fast. mlx version already existed: https://huggingface.co/mlx-community/MiniCPM4-8B-4bit