unable to pass video file from google storage bucket using gsutil to gemini flash model using langchain #28475

ghost · 2024-12-03T13:37:50Z

ghost
Dec 3, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_google_genai import HarmCategory, HarmBlockThreshold, ChatGoogleGenerativeAI

# Define the schema for the expected output
class Highlight(BaseModel):
    startTime: str = Field(description="Start time of the highlight in the format (hh:mm:ss)")
    endTime: str = Field(description="End time of the highlight in the format (hh:mm:ss)")
    caption: str = Field(description="A short caption.")
    scene_type: str = Field(description="Type of scene.")
    scene_rank: str = Field(description="Importance ranking.")
    scene_duration: int = Field(description="Duration in seconds.")

class Highlights(BaseModel):
    highlights: list[Highlight]

# Create the output parser
parser = PydanticOutputParser(pydantic_object=Highlights)

# Configure the chain with the LLM (Gemini Flash 1.5) and safety settings
chain = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash-002", 
    temperature=0.4, 
    max_tokens=8192, 
    top_p=0.95, 
    safety_settings={
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    }
) | parser

# Define the media input
media_message = {
    "type": "media_input",
    "video_url": {
        "url": "https://storage.googleapis.com/sample-video/video.mp4",
    },
}

# Define the human message (input video length, etc.) without exposing prompt details
text_message = """Please analyze the video clip and extract key scenes with significant dialogues and interactions."""

# Invoke the chain with the input
result = await chain.ainvoke({
    "language": "English",
    "media_input": media_message,
    "human_message": text_message
})

# Output the results
print(result.json(indent=2))

Description

I've developed a GenAI-based application that extracts insights from video clips uploaded to Google Cloud Storage. The method involves retrieving video files via gsutil and processing them using the Gemini Flash 1.5 LLM within a LangChain prompt-to-parser chain. This has worked perfectly since early September.

However, for the past two days, the responses generated by the application are not relevant to the actual content in the video files. I suspect the issue might lie within the code or LangChain's built-in functions. Specifically, it seems like the video files aren't being correctly passed to the LLM, resulting in hallucinated outputs.

Additional Information:

I'm using LangChain to structure the entire process.
The LLM in use is Gemini Flash 1.5.
Safety settings are configured to allow handling of all content categories.
The issue started suddenly without significant changes to the core logic.

System Info

platform linux,
python version 3.9.5
langchain (latest version)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unable to pass video file from google storage bucket using gsutil to gemini flash model using langchain #28475

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

unable to pass video file from google storage bucket using gsutil to gemini flash model using langchain #28475

Uh oh!

ghost Dec 3, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 0 comments

ghost
Dec 3, 2024