-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
I would like https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-PT multimodal support.
Motivation
I couldn't find any discussion or issue for this, but that's the best open source model I could find for OCRing hand written Japanese and Chinese text that actually kind of works.
It's worse than OpenAIs recognition, but in my 3 test images I use to evaluate OCR capabilities of open source models it performed ok (ok is better than everything else I tested)
It's better than
- gemma
- qwen
- intern
- lfm2
- kimi
(I think I tested mimo, but I can't find my setup or results... so maybe mimo is ok too?)
...
and every other open model I could find.
Possible Implementation
https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf
https://github.com/bigdavidone/ERNIE4_5
vllm-project/vllm#20220
m1namuci
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request