Skip to content

feat: add image upload support with compression and increase body limit#39

Open
Hieuslecong wants to merge 3 commits intontthanh2603:mainfrom
Hieuslecong:feature/gemini-vision-support
Open

feat: add image upload support with compression and increase body limit#39
Hieuslecong wants to merge 3 commits intontthanh2603:mainfrom
Hieuslecong:feature/gemini-vision-support

Conversation

@Hieuslecong
Copy link
Copy Markdown

Description

This PR introduces full support for Vision (Image Upload) capabilities. Users can now send image data along with text prompts to the Gemini model through the API.

Key Changes

  • Core Vision Logic: Implemented the file upload handshake with Google's content-push service, enabling the use of image identifiers in generation requests.
  • Improved Data Handling: Updated the service layer to decode and process base64 image data from API requests.
  • Server Optimization: Adjusted the maximum request body limit to support larger media payloads.
  • Enhanced Client Tools:
    • Added new demonstration scripts for image-based chat.
    • Integrated automatic image compression and resizing in client scripts to optimize bandwidth and prevent payload size errors.

Testing

  • Verified successful image recognition and description by the AI model.
  • Confirmed that large, high-resolution images are correctly handled via client-side processing before transmission.

Summary: Vision support, payload optimization, and client-side image processing.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Bản cập nhật này giới thiệu khả năng Vision (tải ảnh) đầy đủ, cho phép người dùng gửi dữ liệu hình ảnh cùng với các lời nhắc văn bản đến mô hình Gemini thông qua API. Nó bao gồm các thay đổi ở cả phía máy chủ để xử lý và tải ảnh lên dịch vụ của Google, cũng như các cải tiến ở phía client để nén và tối ưu hóa hình ảnh trước khi gửi.

Highlights

  • Hỗ trợ Tải ảnh (Vision): Đã triển khai logic cốt lõi để hỗ trợ tải ảnh lên, bao gồm quá trình bắt tay với dịch vụ content-push của Google và cho phép sử dụng định danh ảnh trong các yêu cầu tạo nội dung.
  • Xử lý dữ liệu ảnh: Cập nhật lớp dịch vụ để giải mã và xử lý dữ liệu ảnh base64 từ các yêu cầu API.
  • Tối ưu hóa máy chủ: Điều chỉnh giới hạn kích thước body yêu cầu tối đa của máy chủ lên 20MB để hỗ trợ tải trọng media lớn hơn.
  • Công cụ client nâng cao: Thêm các script demo mới cho chat dựa trên hình ảnh, tích hợp tính năng nén và thay đổi kích thước ảnh tự động ở phía client để tối ưu hóa băng thông và tránh lỗi kích thước payload.
Changelog
  • demo_ask_image.py
    • Added a new Python script demo_ask_image.py to demonstrate image-based chat with the Gemini API.
    • Included encode_image function for image compression and Base64 encoding using Pillow.
    • Implemented logic to construct and send API requests with both text prompts and inline image data.
  • demo_upload.py
    • Added a new Python script demo_upload.py for demonstrating image upload and description using the Gemini API.
    • Incorporated the encode_image function for client-side image processing.
    • Structured the payload to send an image with a default descriptive prompt.
  • internal/modules/gemini/gemini_service.go
    • Imported encoding/base64 package.
    • Modified GenerateContent to parse InlineData from request parts, decode base64 image data, and collect FileData.
    • Updated the content generation logic to pass collected FileData to the provider client.
    • Adjusted the empty content check to also consider the presence of files.
  • internal/modules/providers/gemini_service.go
    • Imported bytes package.
    • Added EndpointUpload constant for Google's content-push service.
    • Implemented UploadFile function to handle uploading FileData to the content-push endpoint and return the file identifier.
    • Modified GenerateContent to iterate through config.Files, upload each file using UploadFile, and construct the request payload with the returned file URLs.
    • Adjusted the messageContent structure to accommodate image data in the request.
  • internal/modules/providers/provider_interface.go
    • Defined a new struct FileData with fields MimeType, Data (byte slice), and FileName.
    • Changed the Files field in GenerateConfig from []string to []FileData.
    • Updated the WithFiles option function to accept []FileData.
  • internal/server/server.go
    • Increased the BodyLimit for the Fiber application to 20 * 1024 * 1024 bytes (20 MB) to support larger request payloads, specifically for image uploads.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for image uploads (Vision) to the Gemini service. The changes are comprehensive, including updates to the service layer to handle file data, a new provider-level file upload mechanism, and an increased server body limit. Two new Python demo scripts are also included to showcase the new functionality. My review has identified a few issues: a bug in demo_upload.py where an incorrect MIME type is used, a more significant bug in internal/modules/providers/gemini_service.go related to duplicated and inconsistent filename generation for uploads, and a minor maintainability issue with a magic string in the same file. Overall, the implementation is solid, but the identified issues, especially the filename generation bug, should be addressed to ensure robustness.

Hieuslecong and others added 2 commits March 14, 2026 00:01
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ntthanh2603 ntthanh2603 self-requested a review March 14, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant