Skip to content

Conversation

MayuriKhinvasara
Copy link
Contributor

@MayuriKhinvasara MayuriKhinvasara commented Aug 11, 2025

Add Gemini Video Metadata Creation Sample

A new sample demonstrating how to use the Gemini API with Firebase and Media3 to extract metadata from videos.

The sample includes:

  • UI for selecting a video from a predefined list or a custom URL.
  • A video player using ExoPlayer to display the selected video.
  • Buttons to trigger different metadata extraction tasks: Thumbnails, Description, Hashtags, Account Tags, Chapters, and Links.
  • Displays the generated text metadata and extracted thumbnail images.
  • Utilizes media3-transformer to extract HDR thumbnails from the video.
  • Includes utility functions for timestamp conversion and managing the video list.
  • Implements a ViewModel to handle the interaction with the Gemini API and manage UI state.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @MayuriKhinvasara, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Android AI sample application named "Gemini Video Metadata Creation." This sample showcases how to use the Gemini API, integrated with Firebase, to generate various types of metadata for videos, such as descriptions, hashtags, account tags, chapters, and links. It also demonstrates the extraction of HDR quality thumbnails from videos based on timestamps provided by the Gemini model, utilizing Media3's experimental frame extractor. The changes encompass all necessary UI components, ViewModel logic, and utility functions to provide a comprehensive example of AI-powered video content analysis on Android.

Highlights

  • New Sample Application: A new sample application, "Gemini Video Metadata Creation," has been added to demonstrate AI-powered video metadata generation.
  • Gemini API Integration: The sample integrates with the Gemini 2.0 Flash model via Firebase AI to generate various types of metadata, including descriptions, hashtags, account tags, chapters, and links, directly from video content.
  • Video Playback and Selection: The application now includes video playback functionality using ExoPlayer and a user-friendly dropdown menu for selecting different sample videos.
  • Dynamic Metadata Generation: Users can dynamically choose the type of metadata they wish to generate, with the application sending corresponding prompts to the Gemini model.
  • Thumbnail Extraction: A key feature is the ability to extract high-dynamic-range (HDR) quality video thumbnails based on timestamps provided by the Gemini model, leveraging Media3's experimental frame extractor.
  • New UI Components: Dedicated Compose UI components have been introduced for video selection, playback, metadata type selection (via a button grid), and displaying the generated text and thumbnails.
  • Dependency Updates: The Media3 library has been updated to version 1.8.0, and the media3-transformer dependency has been added to support advanced video processing capabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new sample application for video metadata creation using Gemini. The implementation is generally well-structured, utilizing modern Android development practices like Jetpack Compose, Hilt for dependency injection, and StateFlow for UI state management, which aligns with the provided architectural guidelines. However, I've identified several areas for improvement. There is a critical resource leak in the thumbnail extraction logic that needs to be addressed, a misconfiguration in the build script's namespace, and a few violations of type safety and the project's style guide. Additionally, there are opportunities to remove unused code. Addressing these points will significantly enhance the quality and robustness of the sample.

* [StateFlow].
*/
@HiltViewModel
class VideoMetadataCreationViewModel @Inject constructor(private val application: Application) : ViewModel() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Injecting Application directly into the ViewModel violates the architectural principle of keeping ViewModels framework-agnostic.12 This makes the ViewModel harder to test and can lead to issues if not handled carefully. The Context dependency required for extractListOfThumbnails should be provided through a lower layer, such as a repository or a use case, which can be injected into the ViewModel.

Style Guide References

Footnotes

  1. The style guide strongly recommends that ViewModels should be agnostic of the Android lifecycle and should not hold references to Context or other lifecycle-related types. (link)

  2. The style guide recommends against using AndroidViewModel and avoiding the Application class in ViewModels, suggesting that such dependencies should be moved to the UI or data layer. (link)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree context, should not be passed. Figuring out a way to still share context to the ExperimentalFrameExtractor to extract thumbnails async

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the style guide is correct there. You can use AndroidViewModel to get an application context in your viewmodel

In your situation, a real app would probably create a class called something like "FrameExtractorService" that took an application context in it's constructor. You would then inject that service into your viewmodel and then your viewmodel wouldn't have to deal with contexts at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think for this sample using an AndroidViewModel is fine.

@MayuriKhinvasara MayuriKhinvasara marked this pull request as ready for review August 11, 2025 20:46
try {
val generativeModel =
Firebase.ai(backend = GenerativeBackend.vertexAI())
.generativeModel("gemini-2.0-flash")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Gemini 2.5 Flash instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will update

import com.android.ai.samples.geminivideosummary.VideoSummarizationScreen
import com.android.ai.samples.genai_image_description.GenAIImageDescriptionScreen
import com.android.ai.samples.genai_summarization.GenAISummarizationScreen
import com.android.ai.samples.genai_writing_assistance.GenAIWritingAssistanceScreen
import com.android.ai.samples.imagen.ui.ImagenScreen
import com.android.ai.samples.magicselfie.ui.MagicSelfieScreen

@SuppressLint("UnsafeOptInUsageError", "NewApi")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of suppressing these (valid) lint errors here at the top level, wdyt about checking for API level inside the VideoMetadataCreationScreen? That way you can show a nice "not supported" message on that screen instead of it crashing on lower API levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very valid point. Updated accordingly


return try {
withContext(Dispatchers.IO) {
// Enable HDR frames fi=or better image quality
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

<string name="create_metadata_button">Create Metadata</string>
<string name="video_metadata_creation_title">Video Metadata Creation</string>
<string name="output_text_combined">%s%s</string>
<string name="output_text_generated_placeholder">"Text generated with Gemini : "</string>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; remove the space between "Gemini" and ":"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

<string name="select_video_placeholder">Select Video</string>
<string name="create_metadata_button">Create Metadata</string>
<string name="video_metadata_creation_title">Video Metadata Creation</string>
<string name="output_text_combined">%s%s</string>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird - why not "Text generated with Gemini: %s" and then just replace the dynamic part of the string? Or is there a case where the first %s would resolve to something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goof catch. This was a typo.

<!--Video titles for list of sample videos-->
<string name="video_title_big_buck_bunny">Big Buck Bunny</string>
<string name="video_title_android_spotlight_shorts">Android Spotlight Week (Shorts video)</string>
<string name="video_title_rio_de_janeiro">Rio De Janerio</string>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* [StateFlow].
*/
@HiltViewModel
class VideoMetadataCreationViewModel @Inject constructor(private val application: Application) : ViewModel() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think for this sample using an AndroidViewModel is fine.

"Provide a compelling and concise description for this video, suitable for a YouTube video description in about 7-8 lines." +
"The description should be engaging and accurately reflect the video\'s content. Don't assume if you don't know"
MetadataType.THUMBNAILS ->
"Get three thumbnails for this video. Return only a comma separated list of timestamps in format \"hh:mm:ss\". Don\'t return any other text."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding - is there any logic into which timestamps are returned here? Does Gemini look for "good" thumbnails or just randomly picks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The original prompt got rewritten somehow. Updated.

),
VideoItem(
R.string.video_title_rio_de_janeiro,
"gs://cloud-samples-data/generative-ai/video/rio_de_janeiro_beyond_the_map_rio.mp4".toUri(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't load for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There some error on the GCP side, Removed the ones which don't load

This change also refactors the prompts into a dedicated file, improves timestamp parsing to support both `hh:mm:ss` and `mm:ss` formats, and makes HDR thumbnail extraction conditional on Android 14 and above. Additionally, unused video samples and annotations have been removed.
This commit migrates the video player from an `AndroidView`-wrapped `PlayerView` to the new `PlayerSurface` composable from the `media3-ui-compose` library.

The screen layout is also updated with weights to better manage the space between the player and the generated metadata.
val promptList = listOf(
Prompt(
metadataType = MetadataType.DESCRIPTION,
text = "Provide a compelling and concise description for this video, suitable for a YouTube video description in about 7-8 lines." +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would add Return just the description, nothing else. to the prompt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Fixed

selectedMetadataType = uiState.selectedMetadataType,
onMetadataCreationClicked = onMetadataTypeClicked,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reset the content of OutputTextDisplay when the user selects a different video?

Copy link
Contributor Author

@MayuriKhinvasara MayuriKhinvasara Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Fixed

@MayuriKhinvasara MayuriKhinvasara changed the title video-metadata-creation Add Video metadata creation sample with Gemini and Media3 Aug 14, 2025
@MayuriKhinvasara MayuriKhinvasara merged commit 9c476d1 into main Aug 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants