feat(webui): upload images/PDFs from the chat composer#579
Open
yaozheng-fang wants to merge 1 commit into
Open
feat(webui): upload images/PDFs from the chat composer#579yaozheng-fang wants to merge 1 commit into
yaozheng-fang wants to merge 1 commit into
Conversation
The composer "+" button now opens an upload menu (上传图片 / 上传文件 PDF). Selected files are read to base64 and sent as inline_data parts on the existing /run_sse new_message, shown as rounded thumbnails / PDF chips both while pending and on the sent user turn (also reconstructed from history). Images already reach the model via ADK's LiteLlm image_url path. PDFs are handled by a new before_model_callback (veadk.utils.pdf_to_images) that renders each page to an image/png part with pypdfium2 so a vision-capable model can read them (effectively OCR, scanned PDFs included). Page count is capped (default 10) to bound token cost. Wired onto the a2ui_agent and basic-app demo agents; added a [pdf] extra (pypdfium2 + pillow).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The web UI composer's + button was inert. It now opens a small upload menu
(上传图片 / 上传文件 (PDF), ChatGPT-style rounded card) and lets users
attach images and PDFs to a chat turn, aligned with the existing
/run_sseendpoint.How
Frontend (
frontend/src/)+toggles a popover with two upload actions backed by hidden<input type="file">(image/* and application/pdf).inline_dataparts onnew_message—runSSEnow prepends attachment parts before the text part.to remove); sent turns render the same, and history is reconstructed from
inline_dataparts so reloaded sessions show them.veadk/webui(committed).Backend
LiteLlmimage_urldata-URI path— no change needed.
veadk/utils/pdf_to_images.py: abefore_model_callbackthat renderseach
application/pdfpart to oneimage/pngpart per page via pypdfium2,so a vision-capable model reads it (effectively OCR; scanned PDFs included).
Page count is capped (default 10) to bound token cost; pypdfium2/pillow are
lazy-imported with a clear "install
veadk-python[pdf]" error.[pdf]extra (pypdfium2+pillow, both permissive licenses — notPyMuPDF/AGPL).
a2ui_agentandbasic-appdemo agents; basic-app installs[a2ui,pdf]and its README documents the attachments feature + vision-modelrequirement.
Verification
tests/test_pdf_to_images.py(3 tests): PDF part replaced by oneimage/pngper page, original text preserved;
max_pagesrespected; non-PDF requestsuntouched. All pass.
npm run build(tsc + vite) succeeds;markdownlint clean on the edited READMEs.
Notes
doubao-seed-1.6is);non-vision models can't consume images or PDF-as-images.
paste-to-upload.