Replies: 1 comment
-
|
Hi there! That's a pretty good point, and we have been dealing with it for some days. However, we have essentially completed the development of this part, so this PR is unnecessary. We appreciate the suggestions you've raised above, and we warmly welcome you to directly submit PRs or raise suggestions in the future. Our developers will always be free to review them and provide feedback on merging. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the original code, the framework would hand over the user-uploaded image to the visual model to generate a description, and then hand over the user's question and description together to the large language model for resolution.
However, this prevents the large language model from using tools to manipulate the image because it simply cannot access the image.
In fact, during the previous preprocessing, the file information did indeed store its URL. If we pass both the URL and description to the large language model, it can then call the configured mcp tool. First, the image URL is passed to the mcp tool, which then processes the image.
Something like this:
I have confirmed that the above changes can achieve the desired effect.
I'm not sure if this aligns with nexent's development strategy. If I submit a pull request, will it be accepted? Or is there a better approach that I'm unaware of?
Beta Was this translation helpful? Give feedback.
All reactions