add identifier field support to FileUrl and subclasses #2636

kyuam32 · 2025-08-21T12:51:29Z

This PR implements the feature requested in issue #2635 myself - adding an optional identifier field to FileUrl and its subclasses to enable file url tracking across conversations.

Changes Made

Updated messages.py

Added identifier at FileUrl
Updated constructors for all subclasses (ImageUrl, VideoUrl, AudioUrl, DocumentUrl)

Updated _agent_graph.py

Modified multimodal content processing to use custom identifier when available

Added test coverage

Added test_tool_returning_file_url_with_identifier to verify the feature works correctly for all FileUrl subclasses

Testing

All existing and new test added checked with makefile.

Note: I've implemented this as a proposal from what i requested as a issue. I understand feature request hasn't been approved yet, so please feel free to close this PR if the approach doesn't align with the project's direction.
I'm happy to adjust the implementation based on your feedback or explore alternative solutions.
This is my first contribution to opensource. I've tried to follow the existing patterns closely, similar implementation #2231.
Thank you for your time and consideration!

DouweM · 2025-08-25T21:55:44Z

pydantic_ai_slim/pydantic_ai/messages.py

        self,
        url: str,
        force_download: bool = False,
+        identifier: str | None = None,


For this not to be a breaking change, this needs to added at the end so that previous initializations that didn't use keyword arguments still work.

DouweM · 2025-08-25T21:58:11Z

pydantic_ai_slim/pydantic_ai/_agent_graph.py

                    identifier = content.identifier or multi_modal_content_identifier(content.data)
                else:
-                    identifier = multi_modal_content_identifier(content.url)
+                    identifier = content.identifier or multi_modal_content_identifier(content.url)


We can move this to an if content.identifier: branch before the current 2, so we can assume we need to generate it here

DouweM · 2025-08-25T22:00:58Z

pydantic_ai_slim/pydantic_ai/messages.py

        self.url = url
-        self.vendor_metadata = vendor_metadata
        self.force_download = force_download
+        self.identifier = identifier


Can you see if we can move the multi_modal_content_identifier function that's used in _agent_graph.py to here, so we can use it if no identifier was explicitly provided? That way we may be able to define the field as always being str (so no None) and always use it without having to check if it's set, similar to media_type.

@DouweM I've updated the implementation based on your feedback. The identifier property now always
returns a str for FileUrl classes. Please see my detailed response below.

…om BinaryContent

kyuam32 · 2025-08-26T15:35:51Z

We can move this to an if content.identifier: branch before the current 2, so we can assume we need to generate it here

Thank you for review @DouweM,

I've implement your suggestion to use identifier property for FileUrl (similar to media_type)
that always returns a str. I think it's a good solution! However, I've encountered type checking issues.

The problem occurs trying to implement mentioned quote style branch:

  if content.identifier:
      identifier = content.identifier
  else:
      identifier = multi_modal_content_identifier(content.data)

The type checker cannot infer the identifier property of FileUrl and try to check attribute data.
Since only BinaryContent has a data attribute (while FileUrl subclasses don't), pyright reports errors.

If refactor BinaryContent to also use a constructor with a property (or maybe post init) that always returns str for
identifier, we could:

Eliminate the branching entirely (since identifier would always have a str value)
Move the multi_modal_content_identifier() logic into BinaryContent's property

But i'm not sure is this the right way.

For now, I've kept the existing branching logic and only moved the FileUrl.identifier generation logic
from multi_modal_content_identifier() into the FileUrl property.

if isinstance(content, _messages.BinaryContent):
  identifier = content.identifier or multi_modal_content_identifier(content.data)
else:
  identifier = content.identifier

What do you think about this approach? Would you prefer to keep it this way, or should we consider making
BinaryContent.identifier also always return a str for consistency?

DouweM · 2025-08-26T17:20:14Z

should we consider making
BinaryContent.identifier also always return a str for consistency?

@kyuam32 Yep, let's give that a try. The function can live as a private function in that same file and be used by both classes

kyuam32 · 2025-08-27T12:15:26Z

@DouweM Thank you for your patience and detailed feedback throughout this PR!

Identifier support for MultiModalContentTypes

New parameter(identifier) placed at the end of constructors (for backward compatibility)
identifier always returns a str (auto-generated if not provided)
Moved identifier generation logic multi_modal_content_identifier() to messages.py
Simplify branch when tool calling processes contents at _angent_graph.py

Design Considerations

Initially tried using private _identifier field with @property for FileUrl (similar to media_type)
However, BinaryContent already had identifier as a public field in the existing codebase
This caused serialization inconsistency:
- BinaryContent serialized as {“identifier”: “...“}
- FileUrl serialized as {“_identifier”: “...“}
Aligned FileUrl implementation with BinaryContent for consistency

class FileUrl:
    identifier: str
    def __init__(self, url: str, ..., identifier: str | None = None):
        ...
        self.identifier = identifier or _multi_modal_content_identifier(url)
class BinaryContent:
    identifier: str
    def __init__(self, data: bytes, ..., identifier: str | None = None):
        ...
        self.identifier = identifier or _multi_modal_content_identifier(data)

Testing Updates

Modified FileUrl, BinaryContent serialization tests to focus on verify auto-generated identifiers
Modified FileUrl, BinaryContent identifier tests to focus on verify custom identifier handling

…-and-subclasses

kyuam32 · 2025-08-28T00:46:26Z

@DouweM
fixed test failure after merging main branch

…-and-subclasses

kyuam32 · 2025-09-01T14:17:11Z

Hi @DouweM.
Would appreciate a review on this PR when convenient.
Happy to address any feedback.
Thanks for your time!

DouweM · 2025-09-01T20:03:42Z

pydantic_ai_slim/pydantic_ai/messages.py

    url: str
    """The URL of the file."""

+    identifier: str


Please make this the final field, like it is on the constructor

@DouweM Thank you for the review!
Moved identifier field to be the final field in the dataclass definition

pydantic_ai_slim/pydantic_ai/messages.py

DouweM · 2025-09-01T20:04:55Z

pydantic_ai_slim/pydantic_ai/messages.py

        media_type: str | None = None,
        kind: Literal['video-url'] = 'video-url',
+        identifier: str | None = None,
        *,


Please move the * ahead of force_download so all arguments other than url need to be keywords arguments -- same for the other subclasses

refactored with same rules for other MultiModalContent types

pydantic_ai_slim/pydantic_ai/messages.py

DouweM · 2025-09-01T20:05:57Z

pydantic_ai_slim/pydantic_ai/messages.py

+    e.g. "This is file <identifier>:" preceding the `BinaryContent`.
    """

+    _: KW_ONLY


Please move this ahead of media_type

Moved _: KW_ONLY ahead of media_type, after data field

DouweM · 2025-09-02T18:03:50Z

@kyuam32 Thank you!

kousun12 · 2025-10-06T20:00:43Z

pydantic_ai_slim/pydantic_ai/messages.py

+    """The media type of the binary data."""

-    This identifier can be provided to the model in a message to allow it to refer to this file in a tool call argument, and the tool can look up the file in question by iterating over the message history and finding the matching `BinaryContent`.
+    identifier: str


@DouweM @kyuam32 -- was this intended to go to str vs keeping str | None? we store old json to rehydrate later into chags and now ModelMessageAdapter.validate_json throws because it expects identifier to have a value for those old dicts

@kousun12 Ay that was unintentional, I was relying on the fact that we always set an identifier in __init__, but that wouldn't work for validation. Can you create a new issue please?

Related issue has been filed: #3103

add identifier field support to FileUrl and subclasses

02ad24b

kyuam32 mentioned this pull request Aug 21, 2025

Add identifier field to FileUrl and subclasses #2635

Closed

DouweM requested changes Aug 25, 2025

View reviewed changes

DouweM self-assigned this Aug 25, 2025

DouweM added the awaiting author revision label Aug 25, 2025

kyuam32 added 4 commits August 26, 2025 22:46

generate identifier dynamically when not explicitly set

39a402b

remove identifier property from serialization test

ad500b0

update multi_modal_content_identifier to only accept bytes input fr…

3211a2b

…om BinaryContent

remove dynamic generation of identifier for FileUrl content

1427dbf

kyuam32 requested a review from DouweM August 26, 2025 15:53

kyuam32 added 3 commits August 27, 2025 21:03

refactor identifier handling for BinaryContent and FileUrl

b80788a

reorder arguments in Message initializer for consistency

ea2f049

reorder arguments in Message initializer for consistency

f61233a

kyuam32 added 5 commits August 27, 2025 21:15

reorder arguments in Message initializer for consistency

f2686de

Merge branch 'main' into feat/add-identifier-field-support-to-FileUrl…

a4f4f07

…-and-subclasses

remove unused HandleResponseEvent import

0905d20

add identifier field to updated test case

737b3bc

trim whitespace in BinaryContent docstring

31f2035

kyuam32 added 3 commits September 1, 2025 22:45

Merge branch 'main' into feat/add-identifier-field-support-to-FileUrl…

0d0e8a4

…-and-subclasses

trim whitespace in messages.py docstring

5e9b713

fix update identifier test cases following UserPromptPart refactor

232c085

DouweM requested changes Sep 1, 2025

View reviewed changes

add * to enforce keywords arguments for MultiModalContent types init

92a2645

kyuam32 requested a review from DouweM September 2, 2025 00:26

DouweM merged commit 46ba28f into pydantic:main Sep 2, 2025
41 checks passed

kousun12 reviewed Oct 6, 2025

View reviewed changes

add identifier field support to FileUrl and subclasses #2636

add identifier field support to FileUrl and subclasses #2636

Uh oh!

Conversation

kyuam32 commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyuam32 commented Aug 26, 2025

Uh oh!

DouweM commented Aug 26, 2025

Uh oh!

kyuam32 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyuam32 commented Aug 28, 2025

Uh oh!

kyuam32 commented Sep 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyuam32 Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyuam32 Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DouweM commented Sep 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kyuam32 commented Aug 21, 2025 •

edited

Loading

kyuam32 commented Aug 27, 2025 •

edited

Loading

kyuam32 Sep 2, 2025 •

edited

Loading

kyuam32 Sep 2, 2025 •

edited

Loading