feat(webui): upload images/PDFs from the chat composer by yaozheng-fang · Pull Request #579 · volcengine/veadk-python

yaozheng-fang · 2026-06-03T09:17:17Z

What

The web UI composer's + button was inert. It now opens a small upload menu
(上传图片 / 上传文件 (PDF), ChatGPT-style rounded card) and lets users
attach images and PDFs to a chat turn, aligned with the existing
/run_sse endpoint.

How

Frontend (frontend/src/)

+ toggles a popover with two upload actions backed by hidden
<input type="file"> (image/* and application/pdf).
Files are read to base64 and sent as inline_data parts on
new_message — runSSE now prepends attachment parts before the text part.
Pending attachments render as rounded image thumbnails / PDF chips (with an ×
to remove); sent turns render the same, and history is reconstructed from
inline_data parts so reloaded sessions show them.
Per-file cap (~20 MB) with an error message.
Rebuilt veadk/webui (committed).

Backend

Images already reach the model via ADK's LiteLlm image_url data-URI path
— no change needed.
New veadk/utils/pdf_to_images.py: a before_model_callback that renders
each application/pdf part to one image/png part per page via pypdfium2,
so a vision-capable model reads it (effectively OCR; scanned PDFs included).
Page count is capped (default 10) to bound token cost; pypdfium2/pillow are
lazy-imported with a clear "install veadk-python[pdf]" error.
Added a [pdf] extra (pypdfium2 + pillow, both permissive licenses — not
PyMuPDF/AGPL).
Wired onto the a2ui_agent and basic-app demo agents; basic-app installs
[a2ui,pdf] and its README documents the attachments feature + vision-model
requirement.

Verification

tests/test_pdf_to_images.py (3 tests): PDF part replaced by one image/png
per page, original text preserved; max_pages respected; non-PDF requests
untouched. All pass.
Pyright + Ruff clean on the new Python; npm run build (tsc + vite) succeeds;
markdownlint clean on the edited READMEs.

Notes

The agent model must be vision-capable (default doubao-seed-1.6 is);
non-vision models can't consume images or PDF-as-images.
Out of scope (follow-up): literal PDF→text extraction, drag-&-drop,
paste-to-upload.

The composer "+" button now opens an upload menu (上传图片 / 上传文件 PDF). Selected files are read to base64 and sent as inline_data parts on the existing /run_sse new_message, shown as rounded thumbnails / PDF chips both while pending and on the sent user turn (also reconstructed from history). Images already reach the model via ADK's LiteLlm image_url path. PDFs are handled by a new before_model_callback (veadk.utils.pdf_to_images) that renders each page to an image/png part with pypdfium2 so a vision-capable model can read them (effectively OCR, scanned PDFs included). Page count is capped (default 10) to bound token cost. Wired onto the a2ui_agent and basic-app demo agents; added a [pdf] extra (pypdfium2 + pillow).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(webui): upload images/PDFs from the chat composer#579

feat(webui): upload images/PDFs from the chat composer#579
yaozheng-fang wants to merge 1 commit into
mainfrom
feat/composer-file-upload

yaozheng-fang commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yaozheng-fang commented Jun 3, 2026

What

How

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant