Skip to content

RFC: Let models declare accepted file MIME types for native file handling #2188

@tessaherself

Description

@tessaherself

Summary

Models should be able to declare which file MIME types they accept (e.g. application/pdf, image/*) so the frontend can adapt the upload UI and the backend can deliver files in the provider-native format.

Related issues

Current state

  • multimodal: true + multimodalAcceptedMimetypes works well for images
  • There's no equivalent for documents (PDF, DOCX, etc.)
  • Binary files currently get base64-wrapped in XML tags, which most models can't process

Proposal

Add an acceptedFileMimetypes field to the model config:

{
  "name": "gpt-4o",
  "multimodal": true,
  "acceptedFileMimetypes": ["image/*", "application/pdf"]
}

How it works:

  1. Each model declares which file MIME types it accepts
  2. The frontend merges acceptedFileMimetypes with multimodalAcceptedMimetypes to determine which upload options to show
  3. The endpoint adapter delivers files in the provider-native format (e.g., OpenAI's file content part for PDFs, image_url for images)
  4. For models/providers that don't natively handle a file type, the existing text extraction fallback still works

Why this approach:

  • Works for any provider — OpenAI, Anthropic, self-hosted via vLLM/Ollama, HF Inference API
  • Models that natively support PDFs (GPT-4o, Claude, Gemini) get native handling
  • Self-hosted models can still receive extracted text as fallback
  • No heavy dependencies (no LibreOffice, no server-side PDF parsing required in core)
  • Backward compatible — supportsBinaryDocs: true can be mapped to acceptedFileMimetypes: [...]

Comparison with other projects:

  • LibreChat has a multi-stage file processor pipeline with per-endpoint config
  • Open WebUI has pluggable storage backends and document RAG workflows
  • This proposal is lighter: trust the model/provider to handle what it declares it supports

I'm preparing PRs for this. Would love feedback from maintainers on the approach before finalizing.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions