Hi all,
I'm trying to deploy the docling-serve 1.6.0 with the docling-granite VLLM on a A10 on modal.com. The standard pipeline works fine (30s/document - is this runtime reasonable?). when I run with pipeline=vlm the process goes in timeout. I tried with a L40S GPU too but it seems that the GPU is not the issue. A < 1B params model should fit into any of those.
What GPU would you recommend using?
Any suggestion for deployment for using the pipeline=vlm?