-
Notifications
You must be signed in to change notification settings - Fork 23
Closed as not planned
Closed as not planned
Copy link
Description
Summary
Propose adding an OCR (Optical Character Recognition) backend to enable local document text extraction capabilities within Docker Model Runner.
Motivation
- Expand Docker Model Runner beyond text generation to include vision/document processing
- Enable privacy-focused local OCR without cloud dependencies
- Leverage existing model distribution and scheduling infrastructure
Proposed Implementation
- Create new OCR backend following existing patterns in
pkg/inference/backends/
- Integrate with popular document AI, e.g., layoutLMv3, Donut, and et cetera
- Support common image formats (PNG, JPEG, PDF)
- Expose OCR functionality through OpenAI-compatible API endpoints
Technical Considerations
- Follow existing backend interface in
pkg/inference/backends/llamacpp/llamacpp.go
- Leverage model distribution system for OCR model downloads
- Integrate with resource management for memory allocation
- Support both CPU and GPU acceleration where available
Questions for Maintainers
- Preferred document AI models?
- API endpoint design preferences?
- Model packaging/distribution strategy?
Comment
I would be very grateful in my work if I could easily test document AI; OCR locally!
Metadata
Metadata
Assignees
Labels
No labels