Skip to content

high memory footprint: image encoding / decoding done twice in pydantic-ai + google-genai ?Β #2958

@charlesmindee

Description

@charlesmindee

Question

Hello, when calling gemini through GoogleModel on big documents with a lot of pages, I have the memory usage which is diverging.
After profiling the memory, I found out that the operation which is taking most of the memory is the encoding / decoding of images (screenshot profiling report). The 2 first lines of the report have exactly the same memory usage, and when looking into the source code both lines are doing (almost) the same thing:

pydantic_ai.models.google.py l.524 in _map_user_prompt:

elif isinstance(item, BinaryContent):
        # NOTE: The type from Google GenAI is incorrect, it should be `str`, not `bytes`.
        base64_encoded = base64.b64encode(item.data).decode('utf-8')

google.genai._common.py l.520 in encode_unserializable_types:

if isinstance(value, bytes):
        processed_data[key] = base64.urlsafe_b64encode(value).decode('ascii')

Do you think there is a redundancy here ?
Do you have any suggestions to reduce memory usage ? I am sending the same images in the user prompt of many inferences but I couldn't find a way to share the encoding / decoding of images accross the calls since this operation is done at each inference.

Thank you !

Image

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions