-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Question
Hello, when calling gemini through GoogleModel on big documents with a lot of pages, I have the memory usage which is diverging.
After profiling the memory, I found out that the operation which is taking most of the memory is the encoding / decoding of images (screenshot profiling report). The 2 first lines of the report have exactly the same memory usage, and when looking into the source code both lines are doing (almost) the same thing:
pydantic_ai.models.google.py l.524 in _map_user_prompt:
elif isinstance(item, BinaryContent):
# NOTE: The type from Google GenAI is incorrect, it should be `str`, not `bytes`.
base64_encoded = base64.b64encode(item.data).decode('utf-8')
google.genai._common.py l.520 in encode_unserializable_types:
if isinstance(value, bytes):
processed_data[key] = base64.urlsafe_b64encode(value).decode('ascii')
Do you think there is a redundancy here ?
Do you have any suggestions to reduce memory usage ? I am sending the same images in the user prompt of many inferences but I couldn't find a way to share the encoding / decoding of images accross the calls since this operation is done at each inference.
Thank you !
Additional Context
No response