Support MCP resources, multimodal, unstructured-* with integration tests#2918
Support MCP resources, multimodal, unstructured-* with integration tests#2918
Conversation
| feature = "full", | ||
| derive(IntoValue, FromValue, desert_rust::BinaryCodec) | ||
| )] | ||
| #[cfg_attr(feature = "full", derive(desert_rust::BinaryCodec))] |
There was a problem hiding this comment.
Why is this changed to manual implementation? Revert if possible
There was a problem hiding this comment.
Automatic didn't work for functions that returned multimodal if I remember correctly
There was a problem hiding this comment.
So just re-tested and yes I need this.
There was a problem hiding this comment.
Please explain "did not work". We can fix the deriver.
There was a problem hiding this comment.
So, WIT defines multimodal as list<tuple<string, element-value>>, and the actual runtime value is Value::Tuple. Not Value::Record.
What's in here is UntypedNamedElementValue {name, value} and the derive macro for FromVlaue will produce something that expects Value::Record.
So, ended up having Expected Record value with 2 fields, got Tuple([String("Text"), Variant { ... }]).
I think, its not a bug in the deriver to fix.
There was a problem hiding this comment.
One thing is untyped-named-element-value is used only in multimodal..No other wit types ends up being untyped-named-element-value.. so may be a rename will solve it, but i don't want to do that.
Also note that this is nothing related to MCP. Invoking a function that returns multimodal didn't work at all
* implement prompt hints * reformat code * fix integration tests * fix a bug related to validation of output schema in mcp inspector (#2969)
|
Also merged #2980 into this. |
* implement prompt hints * reformat code * fix integration tests * fix a bug related to validation of output schema in mcp inspector * fix multimodal images * make sure to render images * Reformat code * fix compilation errors and cleanup * start fixing tests * start fixing tests * reformat code * reformat code * fix all unit tests
Fixes #2795
Fixes #2863
Fixes #2794
Fixes #2822 (See #2961)
Also updated with #2980 (this required more testing and that took some time)
Note that, I am not doing any kebab case conversions to make things "look" good. I will do that separately, because it will hurt ongoing changes, and need a separate task for it. Things work without that now.
The whole MCP changes right now covers all features except prompt. It took a lot of re-iterations to make things work with MCP clients. OpenAI playground worked with tools but not with resources (because it didn't support).
Claude Desktop and MCP inspector are the new one I tested. Now I can see
golemwithin Claude Desktop successfully. Claude Desktop actually helped with some real bugs.Here is a screenshot of Claude configured with Golem (thought it was impossible until I came to know about https://www.npmjs.com/package/mcp-remote). Pasting the config here (might be helpful for docs)
In
claude_desktop_config.jsonand restart desktop:{ "mcpServers": { "golem": { "command" : "npx", "args" : [ "mcp-remote", "http://localhost:9007/mcp", "--header", "Host: localhost:9007", "--allow-http" ] } } }These were not running for a major time of this PR in the draft state.
MCP inspector also works similar to Claude Desktop. Open AI doesn't because, they don't show options for resources.
I also tested resources, and MCP inspector is the best test that I could do, and it understood both resource template and concrete resource as mentioned in the MCP protocol.
Note
Also note that, everything in Golem doesn't have exact one to one with MCP as mentioned in the spec written by John. We are making the best approximations here. And whatever you see in this PR, has been manually tested with popular clients and making changes will need retesting.
I would also like to test more but only as part of release tests that will be done later. And as part of bug fixes.
Static Resource
It implies the resource is part of a cluster singleton
Template Resource
It implies resource depends on the identity of your agent
These are tested with official MCP Inspector (the best client for testing), and invoked these tools. Before this PR, those images never existed
Multimodal - A weather report + weather image together
UnstructuredBinary: A snow fall image, given a city
Now it always render images, instead of at-times base64, or at times MCP client being smart and end up doing erroring out.
Unstrucured-Text report (there is nothing much here)