Skip to content

Fix KeyError when parsing video metadata without audio track in Google models #2507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jerry-heygen
Copy link

This PR fixes #2489.

Problem

When processing video content that lacks an audio track through Google's Gemini models, the metadata parsing fails with a KeyError. The API response includes an AUDIO modality entry in the metadata details, but without the token_count field, causing the code to crash when trying to access it.

This can be reproduced by adding the following line in pydantic_ai_slim/pydantic_ai/models/google.py#L606:

for key, metadata_details in metadata.items():
    if key.endswith('_details') and metadata_details:
        suffix = key.removesuffix('_details')
        for detail in metadata_details:
            details[f'{detail["modality"].lower()}_{suffix}'] = detail.get('token_count', 0)
            print("DEBUG: detail: ", detail)  # <-- Add this debug line

processing a video file without an audio track. Debug output shows:

DEBUG: detail:  {'modality': <MediaModality.TEXT: 'TEXT'>, 'token_count': 1201}
DEBUG: detail:  {'modality': <MediaModality.VIDEO: 'VIDEO'>, 'token_count': 2064}
DEBUG: detail:  {'modality': <MediaModality.TEXT: 'TEXT'>, 'token_count': 1717}
DEBUG: detail:  {'modality': <MediaModality.AUDIO: 'AUDIO'>}

Solution

Use .get('token_count', 0) instead of direct key access to safely handle missing token_count fields:

# Before (causes KeyError):
details[f'{detail["modality"].lower()}_{suffix}'] = detail['token_count']

# After (safe with default):
details[f'{detail["modality"].lower()}_{suffix}'] = detail.get('token_count', 0)

This ensures that modalities without token counts (like audio tracks in silent videos) default to 0 tokens rather than crashing the application.

Testing

  • Verified the fix resolves the issue with video files lacking audio tracks
  • All existing Google model tests pass without regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: 'token_count' when using Gemini with VideoUrl
1 participant