-
Notifications
You must be signed in to change notification settings - Fork 1.5k
add identifier field support to FileUrl and subclasses #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add identifier field support to FileUrl and subclasses #2636
Conversation
| self, | ||
| url: str, | ||
| force_download: bool = False, | ||
| identifier: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this not to be a breaking change, this needs to added at the end so that previous initializations that didn't use keyword arguments still work.
| identifier = content.identifier or multi_modal_content_identifier(content.data) | ||
| else: | ||
| identifier = multi_modal_content_identifier(content.url) | ||
| identifier = content.identifier or multi_modal_content_identifier(content.url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this to an if content.identifier: branch before the current 2, so we can assume we need to generate it here
| self.url = url | ||
| self.vendor_metadata = vendor_metadata | ||
| self.force_download = force_download | ||
| self.identifier = identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you see if we can move the multi_modal_content_identifier function that's used in _agent_graph.py to here, so we can use it if no identifier was explicitly provided? That way we may be able to define the field as always being str (so no None) and always use it without having to check if it's set, similar to media_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DouweM I've updated the implementation based on your feedback. The identifier property now always
returns a str for FileUrl classes. Please see my detailed response below.
Thank you for review @DouweM, I've implement your suggestion to use The problem occurs trying to implement mentioned quote style branch: The type checker cannot infer the If refactor BinaryContent to also use a constructor with a property (or maybe post init) that always returns
But i'm not sure is this the right way. For now, I've kept the existing branching logic and only moved the What do you think about this approach? Would you prefer to keep it this way, or should we consider making |
@kyuam32 Yep, let's give that a try. The function can live as a private function in that same file and be used by both classes |
|
@DouweM Thank you for your patience and detailed feedback throughout this PR! Identifier support for
Design Considerations
Testing Updates
|
|
@DouweM |
|
Hi @DouweM. |
| url: str | ||
| """The URL of the file.""" | ||
|
|
||
| identifier: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this the final field, like it is on the constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DouweM Thank you for the review!
Moved identifier field to be the final field in the dataclass definition
| media_type: str | None = None, | ||
| kind: Literal['video-url'] = 'video-url', | ||
| identifier: str | None = None, | ||
| *, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the * ahead of force_download so all arguments other than url need to be keywords arguments -- same for the other subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored with same rules for other MultiModalContent types
| e.g. "This is file <identifier>:" preceding the `BinaryContent`. | ||
| """ | ||
|
|
||
| _: KW_ONLY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this ahead of media_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved _: KW_ONLY ahead of media_type, after data field
|
@kyuam32 Thank you! |
| """The media type of the binary data.""" | ||
|
|
||
| This identifier can be provided to the model in a message to allow it to refer to this file in a tool call argument, and the tool can look up the file in question by iterating over the message history and finding the matching `BinaryContent`. | ||
| identifier: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kousun12 Ay that was unintentional, I was relying on the fact that we always set an identifier in __init__, but that wouldn't work for validation. Can you create a new issue please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issue has been filed: #3103
This PR implements the feature requested in issue #2635 myself - adding an optional identifier field to FileUrl and its subclasses to enable file url tracking across conversations.
Changes Made
messages.pyFileUrlImageUrl,VideoUrl,AudioUrl,DocumentUrl)_agent_graph.pytest_tool_returning_file_url_with_identifierto verify the feature works correctly for allFileUrlsubclassesTesting
Note: I've implemented this as a proposal from what i requested as a issue. I understand feature request hasn't been approved yet, so please feel free to close this PR if the approach doesn't align with the project's direction.
I'm happy to adjust the implementation based on your feedback or explore alternative solutions.
This is my first contribution to opensource. I've tried to follow the existing patterns closely, similar implementation #2231.
Thank you for your time and consideration!