-
Notifications
You must be signed in to change notification settings - Fork 89
Add Video metadata creation sample with Gemini and Media3 #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…deo-metadata-creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @MayuriKhinvasara, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new Android AI sample application named "Gemini Video Metadata Creation." This sample showcases how to use the Gemini API, integrated with Firebase, to generate various types of metadata for videos, such as descriptions, hashtags, account tags, chapters, and links. It also demonstrates the extraction of HDR quality thumbnails from videos based on timestamps provided by the Gemini model, utilizing Media3's experimental frame extractor. The changes encompass all necessary UI components, ViewModel logic, and utility functions to provide a comprehensive example of AI-powered video content analysis on Android.
Highlights
- New Sample Application: A new sample application, "Gemini Video Metadata Creation," has been added to demonstrate AI-powered video metadata generation.
- Gemini API Integration: The sample integrates with the Gemini 2.0 Flash model via Firebase AI to generate various types of metadata, including descriptions, hashtags, account tags, chapters, and links, directly from video content.
- Video Playback and Selection: The application now includes video playback functionality using ExoPlayer and a user-friendly dropdown menu for selecting different sample videos.
- Dynamic Metadata Generation: Users can dynamically choose the type of metadata they wish to generate, with the application sending corresponding prompts to the Gemini model.
- Thumbnail Extraction: A key feature is the ability to extract high-dynamic-range (HDR) quality video thumbnails based on timestamps provided by the Gemini model, leveraging Media3's experimental frame extractor.
- New UI Components: Dedicated Compose UI components have been introduced for video selection, playback, metadata type selection (via a button grid), and displaying the generated text and thumbnails.
- Dependency Updates: The Media3 library has been updated to version 1.8.0, and the
media3-transformer
dependency has been added to support advanced video processing capabilities.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new sample application for video metadata creation using Gemini. The implementation is generally well-structured, utilizing modern Android development practices like Jetpack Compose, Hilt for dependency injection, and StateFlow for UI state management, which aligns with the provided architectural guidelines. However, I've identified several areas for improvement. There is a critical resource leak in the thumbnail extraction logic that needs to be addressed, a misconfiguration in the build script's namespace, and a few violations of type safety and the project's style guide. Additionally, there are opportunities to remove unused code. Addressing these points will significantly enhance the quality and robustness of the sample.
.../main/java/com/android/ai/samples/geminivideometadatacreation/player/ExtractHDRThumbnails.kt
Outdated
Show resolved
Hide resolved
ai-catalog/samples/gemini-video-metadata-creation/build.gradle.kts
Outdated
Show resolved
Hide resolved
...ation/src/main/java/com/android/ai/samples/geminivideometadatacreation/ui/ThumbnailScreen.kt
Outdated
Show resolved
Hide resolved
...ation/src/main/java/com/android/ai/samples/geminivideometadatacreation/ui/ThumbnailScreen.kt
Show resolved
Hide resolved
* [StateFlow]. | ||
*/ | ||
@HiltViewModel | ||
class VideoMetadataCreationViewModel @Inject constructor(private val application: Application) : ViewModel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Injecting Application
directly into the ViewModel
violates the architectural principle of keeping ViewModels framework-agnostic.12 This makes the ViewModel harder to test and can lead to issues if not handled carefully. The Context
dependency required for extractListOfThumbnails
should be provided through a lower layer, such as a repository or a use case, which can be injected into the ViewModel.
Style Guide References
Footnotes
-
The style guide strongly recommends that ViewModels should be agnostic of the Android lifecycle and should not hold references to
Context
or other lifecycle-related types. (link) ↩ -
The style guide recommends against using
AndroidViewModel
and avoiding theApplication
class in ViewModels, suggesting that such dependencies should be moved to the UI or data layer. (link) ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree context, should not be passed. Figuring out a way to still share context to the ExperimentalFrameExtractor to extract thumbnails async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the style guide is correct there. You can use AndroidViewModel to get an application context in your viewmodel
In your situation, a real app would probably create a class called something like "FrameExtractorService" that took an application context in it's constructor. You would then inject that service into your viewmodel and then your viewmodel wouldn't have to deal with contexts at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think for this sample using an AndroidViewModel is fine.
...m/android/ai/samples/geminivideometadatacreation/viewmodel/VideoMetadataCreationViewModel.kt
Outdated
Show resolved
Hide resolved
ai-catalog/samples/gemini-video-metadata-creation/src/main/res/values/strings.xml
Outdated
Show resolved
Hide resolved
…deo-metadata-creation
…deo-metadata-creation
try { | ||
val generativeModel = | ||
Firebase.ai(backend = GenerativeBackend.vertexAI()) | ||
.generativeModel("gemini-2.0-flash") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use Gemini 2.5 Flash instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will update
import com.android.ai.samples.geminivideosummary.VideoSummarizationScreen | ||
import com.android.ai.samples.genai_image_description.GenAIImageDescriptionScreen | ||
import com.android.ai.samples.genai_summarization.GenAISummarizationScreen | ||
import com.android.ai.samples.genai_writing_assistance.GenAIWritingAssistanceScreen | ||
import com.android.ai.samples.imagen.ui.ImagenScreen | ||
import com.android.ai.samples.magicselfie.ui.MagicSelfieScreen | ||
|
||
@SuppressLint("UnsafeOptInUsageError", "NewApi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of suppressing these (valid) lint errors here at the top level, wdyt about checking for API level inside the VideoMetadataCreationScreen
? That way you can show a nice "not supported" message on that screen instead of it crashing on lower API levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very valid point. Updated accordingly
|
||
return try { | ||
withContext(Dispatchers.IO) { | ||
// Enable HDR frames fi=or better image quality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
<string name="create_metadata_button">Create Metadata</string> | ||
<string name="video_metadata_creation_title">Video Metadata Creation</string> | ||
<string name="output_text_combined">%s%s</string> | ||
<string name="output_text_generated_placeholder">"Text generated with Gemini : "</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; remove the space between "Gemini" and ":"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
<string name="select_video_placeholder">Select Video</string> | ||
<string name="create_metadata_button">Create Metadata</string> | ||
<string name="video_metadata_creation_title">Video Metadata Creation</string> | ||
<string name="output_text_combined">%s%s</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit weird - why not "Text generated with Gemini: %s" and then just replace the dynamic part of the string? Or is there a case where the first %s would resolve to something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goof catch. This was a typo.
<!--Video titles for list of sample videos--> | ||
<string name="video_title_big_buck_bunny">Big Buck Bunny</string> | ||
<string name="video_title_android_spotlight_shorts">Android Spotlight Week (Shorts video)</string> | ||
<string name="video_title_rio_de_janeiro">Rio De Janerio</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
* [StateFlow]. | ||
*/ | ||
@HiltViewModel | ||
class VideoMetadataCreationViewModel @Inject constructor(private val application: Application) : ViewModel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think for this sample using an AndroidViewModel is fine.
...m/android/ai/samples/geminivideometadatacreation/viewmodel/VideoMetadataCreationViewModel.kt
Show resolved
Hide resolved
"Provide a compelling and concise description for this video, suitable for a YouTube video description in about 7-8 lines." + | ||
"The description should be engaging and accurately reflect the video\'s content. Don't assume if you don't know" | ||
MetadataType.THUMBNAILS -> | ||
"Get three thumbnails for this video. Return only a comma separated list of timestamps in format \"hh:mm:ss\". Don\'t return any other text." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding - is there any logic into which timestamps are returned here? Does Gemini look for "good" thumbnails or just randomly picks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The original prompt got rewritten somehow. Updated.
), | ||
VideoItem( | ||
R.string.video_title_rio_de_janeiro, | ||
"gs://cloud-samples-data/generative-ai/video/rio_de_janeiro_beyond_the_map_rio.mp4".toUri(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't load for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There some error on the GCP side, Removed the ones which don't load
This change also refactors the prompts into a dedicated file, improves timestamp parsing to support both `hh:mm:ss` and `mm:ss` formats, and makes HDR thumbnail extraction conditional on Android 14 and above. Additionally, unused video samples and annotations have been removed.
This commit migrates the video player from an `AndroidView`-wrapped `PlayerView` to the new `PlayerSurface` composable from the `media3-ui-compose` library. The screen layout is also updated with weights to better manage the space between the player and the generated metadata.
val promptList = listOf( | ||
Prompt( | ||
metadataType = MetadataType.DESCRIPTION, | ||
text = "Provide a compelling and concise description for this video, suitable for a YouTube video description in about 7-8 lines." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would add Return just the description, nothing else.
to the prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Fixed
selectedMetadataType = uiState.selectedMetadataType, | ||
onMetadataCreationClicked = onMetadataTypeClicked, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reset the content of OutputTextDisplay
when the user selects a different video?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Fixed
Add Gemini Video Metadata Creation Sample
A new sample demonstrating how to use the Gemini API with Firebase and Media3 to extract metadata from videos.
The sample includes:
media3-transformer
to extract HDR thumbnails from the video.