Skip to content
Discussion options

You must be logged in to vote

Based on the latest documentation and code, the @langchain/google-genai library currently supports text and image (multimodal) inputs via Gemini models, such as gemini-pro and gemini-pro-vision. You can provide images as part of a single human message, and there are examples for this in both the docs and the README.

For video understanding, there are examples in the LangChain JS Docs showing how to process video files (e.g., MP4) by converting them to base64 and sending them to the Gemini API for structured output. However, these examples use the @langchain/google-vertexai package, not @langchain/google-genai directly. See the media use case documentation for details.

Regarding the Files…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by reedchan7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant