Skip to content

Conversation

@YoungHypo
Copy link
Contributor

PR description

  • Added demo for multimodal analysis, including image, video, audio, and PDF support.
  • Enabled file uploads from local gallery, file system, and internet URLs.
  • Introduced an attachment preview card.
  • Added an independent example module for grounding.
  • Removed obsolete files: InputField.swift and MultimodalInput.swift.
  • Added an all pillar entry to the home screen.

Related issue

#1729.

Demos

Notes: The app functionality is now aligned with quickstart-android

simulator_screenshot_94950DC2-E9F3-4BC4-9959-222B405B707C simulator_screenshot_B9454BDB-159C-42EC-9522-8FA5EF63B3EF

simulator_screenshot_316EDD56-3F17-4128-BC07-3820D7EFC3DE simulator_screenshot_62275E3C-0353-4A8B-9DBB-B0DAED8147AB

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @YoungHypo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Firebase AI example application by integrating robust multimodal analysis features. It introduces the ability to handle various media types like images, videos, audio, and PDFs, alongside flexible file upload options from local sources and URLs. A new attachment preview card improves user interaction, and a dedicated grounding example module demonstrates how AI responses can be anchored to external information. The changes also involve substantial refactoring and cleanup of the codebase, including the removal of obsolete components and the addition of a comprehensive 'all' entry on the home screen for better sample discoverability. These updates aim to provide a more versatile and user-friendly platform for showcasing Firebase AI's capabilities.

Highlights

  • Enhanced Multimodal Analysis: This PR introduces comprehensive multimodal analysis capabilities, allowing the application to process and interact with various media types including images, videos, audio, and PDF documents. This significantly expands the AI model's understanding and response generation based on diverse inputs.
  • Flexible File Uploads: Users can now upload files from multiple sources: their local photo gallery, the device's file system, and directly from internet URLs. This provides great flexibility in how users provide input to the AI model.
  • New Attachment Preview Card: A new attachment preview card has been implemented, offering a visual representation of the files attached to a message before they are sent. This improves the user experience by providing immediate feedback on selected media.
  • Dedicated Grounding Example Module: An independent example module for 'grounding' has been added. Grounding allows the AI model to base its responses on specific, up-to-date information, often from external sources like Google Search, enhancing the accuracy and relevance of generated content.
  • Codebase Cleanup and Refactoring: The project structure has been streamlined by removing obsolete files such as InputField.swift and MultimodalInput.swift, which were replaced by more generic and flexible input handling within the ConversationKit framework. This reduces technical debt and simplifies the codebase.
  • Consolidated Home Screen Experience: A new 'all' pillar entry has been added to the home screen, providing a consolidated view of all available AI samples and demos. This improves navigation and discoverability for users exploring the application's capabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and well-executed pull request that significantly enhances the example app with multimodal capabilities. The introduction of MultimodalAttachment and the associated views and view models is a great addition. The code is generally clean and follows good practices. I have two pieces of feedback: one critical issue regarding Hashable conformance that could lead to bugs, and a minor typo in a filename. Additionally, to improve long-term maintainability, you might consider refactoring the various ViewModel classes (like ChatViewModel, MultimodalViewModel, GroundingViewModel) to inherit from a common base class, as they share a lot of boilerplate code.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the FirebaseAI example app by adding comprehensive multimodal analysis demos, including support for images, videos, audio, and PDFs. The changes are well-structured, introducing new screens, view models, and data models to support the new functionality, while also refactoring existing code. My review focuses on a few key areas: a critical correctness issue in Hashable conformance, and several opportunities to improve maintainability by reducing code duplication, enhancing error handling, and increasing code readability. Overall, this is a great addition to the project.

Comment on lines +112 to 118
if let inlineDataPart = chunk.inlineDataParts.first {
if let uiImage = UIImage(data: inlineDataPart.data) {
messages[messages.count - 1].image = uiImage
} else {
print("Failed to convert inline data to UIImage")
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for handling an inlineDataPart is duplicated in internalSendMessage on lines 159-165. To improve maintainability and reduce redundancy, consider extracting this logic into a private helper method. This would make the code cleaner and easier to manage.

@YoungHypo
Copy link
Contributor Author

YoungHypo commented Aug 18, 2025

New Updates

Move the initialization of FirebaseService into the ViewModels

FirebaseAIExample/ContentView only needs to pass the enum values for Vertex AI and Gemini API, while each ViewModel file contains the full initialization flow. This eliminates the need to jump across multiple files to understand how the service is created, making each functional module easier to grasp, and simpler to reuse.

A new data member fileDataParts was added for Cloud Storage file URLs

In sample.swift, strictly speaking, this is intended only for Vertex AI. However, since the URLs are public and include file-type suffixes, they can also be handled by the Gemini API using the Data(contentsof:) to download files. Both of the ai backends can handle the cloud storage urls in different ways. Ideally, an alert should be added to prevent users from navigating and require them to switch manually. For now, this change needs to be discussed further, as it would refactor the ContentView UI from NavigationLink to NavigationDestination, and the PR already involves too many changes..

Keep only the Assets in the FirebaseALExample folder and remove all other Assets.

Although it looks like many files were changed, about 20 files were just deleted or relocated.

Copy link
Contributor

@peterfriese peterfriese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I left a few comments, nothing major.

}
.disableAttachments()
.onSendMessage { message in
Task {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onSendMessage is now async. You can remove the Task { } here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Removed.

}
.attachmentPreview { attachmentPreviewScrollView }
.onSendMessage { message in
Task {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onSendMessage is now async. You can remove the Task { } here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this applies to all other ConversationView instances as well, also the ones not part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All relevant codes have been deleted.


public static func fromPhotosPickerItem(_ item: PhotosPickerItem) async -> MultimodalAttachment? {
do {
guard let data = try await item.loadTransferable(type: Data.self) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying to attach photos, I often get Failed to create attachment from PhotosPickerItem: [CoreTransferble] Given Transferable item does not support import - not sure if this is caused by live photos.

Can you try to find a solution for this?

Copy link
Contributor Author

@YoungHypo YoungHypo Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic first runs a file type check with validatePhotoType, and then loads via loadTransferable. HEIC will be converted to JPG, and on my iPhone with iOS 26, Gemini was able to analyze them successfully. For unsupported TIFF images, the app now directly shows an error alert.

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a ternary might be more compact:

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Changed.

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

@YoungHypo
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the sample app by introducing comprehensive multimodal capabilities, including support for images, videos, audio, and PDFs from various sources. The refactoring of examples into dedicated modules, such as the new GroundingExample, improves the project's structure and clarity.

I've identified a critical issue in the function call streaming logic and a couple of high-severity issues related to Equatable and Hashable conformance that could lead to UI bugs. My review comments provide specific suggestions to address these points.

Overall, this is a great update that adds valuable features and improves the codebase. The fix for the function call streaming logic is particularly important.

@YoungHypo
Copy link
Contributor Author

YoungHypo commented Aug 28, 2025

Hi @peterfriese,

I’ve made the changes and added file type checks and error handling for the multimodal example to make quickstart-ios more robust. If a file type is not supported by Gemini, the app now shows an error alert.

In my testing, Live Photos in HEIC format will be automatically converted to JPG using item.supportedContentTypes.first?.preferredFilenameExtension, and it runs fine on my device. If it still doesn’t work on your side, please let me know.

I also added comments about the Cloud Storage scenario and updated the logic so that when a related link is deleted, the attachment card and its fileDataPart are removed as well.

Lastly, I fixed the function calling bug in Tuesday’s demo: the second API call cannot be made inside the first streaming loop. My new approach collects all functionCalls during the first loop, exits, and then perform the second API call.

Thanks again for your review and guidance.

Copy link
Contributor

@peterfriese peterfriese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the great work, @YoungHypo!

@YoungHypo
Copy link
Contributor Author

LGTM, thanks for the great work, @YoungHypo!

Thanks @peterfriese. I’ll update the ConversationKit link and fix the CI issues right now.

@YoungHypo YoungHypo force-pushed the firebase-multimodal branch 2 times, most recently from 69b5d29 to b16f96e Compare September 1, 2025 17:27
@YoungHypo
Copy link
Contributor Author

@peterfriese I’ve updated the ConversationKit link and the UI looks fine.

Previously I was using Xcode_26_beta_5 for CI, which is now deprecated, so I switched to the Xcode_26_beta_6. However, the build is still failing. Would you mind helping me? Details are here: https://github.com/actions/runner-images/blob/main/images/macos/macos-15-Readme.md

@YoungHypo
Copy link
Contributor Author

@peterfriese The CI has passed. I added the following condition to scripts/test.sh:
if [[ "${SAMPLE:-}" == "FirebaseAI" && -d "/Applications/Xcode_26.0.app" ]]
This ensures that the Xcode 26 configuration only applies to FirebaseAI and hope not affect other CI checks.

@peterfriese peterfriese merged commit 5970580 into firebase:peterfriese/firebase-ai-quickstart-refresh Sep 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants