-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
Summary
Models should be able to declare which file MIME types they accept (e.g. application/pdf, image/*) so the frontend can adapt the upload UI and the backend can deliver files in the provider-native format.
Related issues
- Feature Request: Enable upload of files to chat with #482 — general file upload
- [Feature Request] Uploading PDFS/Text Files/Images? #609 — PDFs/text/images
- PDF Support #1505 — PDF support
- File Upload / Screenshot for Assistants #1652 — file upload for assistants
Current state
multimodal: true+multimodalAcceptedMimetypesworks well for images- There's no equivalent for documents (PDF, DOCX, etc.)
- Binary files currently get base64-wrapped in XML tags, which most models can't process
Proposal
Add an acceptedFileMimetypes field to the model config:
{
"name": "gpt-4o",
"multimodal": true,
"acceptedFileMimetypes": ["image/*", "application/pdf"]
}How it works:
- Each model declares which file MIME types it accepts
- The frontend merges
acceptedFileMimetypeswithmultimodalAcceptedMimetypesto determine which upload options to show - The endpoint adapter delivers files in the provider-native format (e.g., OpenAI's
filecontent part for PDFs,image_urlfor images) - For models/providers that don't natively handle a file type, the existing text extraction fallback still works
Why this approach:
- Works for any provider — OpenAI, Anthropic, self-hosted via vLLM/Ollama, HF Inference API
- Models that natively support PDFs (GPT-4o, Claude, Gemini) get native handling
- Self-hosted models can still receive extracted text as fallback
- No heavy dependencies (no LibreOffice, no server-side PDF parsing required in core)
- Backward compatible —
supportsBinaryDocs: truecan be mapped toacceptedFileMimetypes: [...]
Comparison with other projects:
- LibreChat has a multi-stage file processor pipeline with per-endpoint config
- Open WebUI has pluggable storage backends and document RAG workflows
- This proposal is lighter: trust the model/provider to handle what it declares it supports
I'm preparing PRs for this. Would love feedback from maintainers on the approach before finalizing.
🤖 Generated with Claude Code
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels