Skip to content

Commit 1f4f1e5

Browse files
authored
[WebModelContext] Update API spec recommendation (#1107)
* [WebModelContext] Update API spec recommendation * Edit.
1 parent 692feea commit 1f4f1e5

File tree

1 file changed

+73
-64
lines changed

1 file changed

+73
-64
lines changed

WebModelContext/explainer.md

Lines changed: 73 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -84,15 +84,9 @@ Handling tool cools in the main thread with the option of workers serves a few p
8484
- **Complexity overhead**: In cases where the site UI is very complex, developers will likely need to do some refactoring or add JavaScript that handles app and UI state with appropriate outputs.
8585
- **Tool discoverability**: There is no built-in mechanism for client applications to discover which sites provide callable tools without visiting or querying them directly. Search engines, or directories of some kind may play a role in helping client applications determine whether a site has relevant tools for the task it is trying to perform.
8686

87-
### API Options
87+
### API
8888

89-
The `window.agent` interface is introduced to represent an abstract AI agent that is connected to the page and uses the page's context. Below are the options being considered for the exact interface:
90-
91-
#### Option 1: Combined tool definition and implementation
92-
93-
The simplest approach, and one that aligns closely with libraries like the MCP SDK is to have a single API that lets the web developer declare tools and provide their implementations in a single call:
94-
95-
**Example:**
89+
The `window.agent` interface is introduced to represent an abstract AI agent that is connected to the page and uses the page's context. The `agent` object has a single method `provideContext` that's used to update the context (currently just tools) available to the agent. The method takes an object with a `tools` property which is a list of tool descriptors. The tool descriptors look as shown in this example below, which aligns with the Prompt API's [tool use](https://github.com/webmachinelearning/prompt-api#tool-use) specification, and other libraries like the MCP SDK:
9690

9791
```js
9892
// Declare tool schema and implementation functions.
@@ -101,55 +95,64 @@ window.agent.provideContext({
10195
{
10296
name: "add-todo",
10397
description: "Add a new todo item to the list",
104-
params: {
98+
inputSchema: {
10599
type: "object",
106100
properties: {
107101
text: { type: "string", description: "The text of the todo item" }
108102
},
109103
required: ["text"]
110104
},
111-
async run(params) => {
105+
async execute({ text }) => {
112106
// Add todo item and update UI.
113107
return /* structured content response */
114108
}
115109
}
116110
]
117111
});
118112
```
113+
114+
The `provideContext` method can be called multiple times. Subsequent calls clear any pre-existing tools and other context before registering the new ones. This is useful for single-page web apps that frequently change UI state and could benefit from presenting different tools depending on which state the UI is currently in.
115+
119116
**Advantages:**
120117

118+
- Aligns with existing APIs.
121119
- Simple for web developers to use.
122120
- Enforces a single function per tool.
123121

124122
**Disadvantages:**
125123

126-
- Must navigate to the page and run JavaScript for tools to be defined.
124+
- Must navigate to the page and run JavaScript for agent to discover tools.
127125

128-
#### Option 2: Separate tool definition and implementation (**Recommended**)
126+
If Web Model Context gains traction in the web developer community, it will become important for agents to have a way to discover which sites have tools that are relevant to a user's request. Discovery is a topic that may warrant its own explainer, but suffice to say, it may be beneficial for agents to have a way to know what capabilities a page offers without having to navigate to the web site first. As an example, a future iteration of this feature could introduce declarative tools definitions that are placed in an app manifest so that agents would only need to fetch the manifest with a simple HTTP GET request. Agents will of course still need to navigate to the site to actually use its tools, but a manifest makes it far less costly to discover these tools and reason about their relevance to the user's task.
129127

130-
Defining and implementing the tools separately opens the possibility of declaring tools outside of JavaScript. A future iteration of this feature could for example introduce tools that are defined declaratively in an app manifest so that agents can discover these without needing to visit the web site first. Agents will of course still need to navigate to the site to actually use its tools, but a manifest makes it far less costly to discover these tools and reason about their relevance to the user's task.
128+
To make such a scenario easier, it would be beneficial to support an alternate means of tool call execution; one that separates the tool defintion and schema (which may exist in an external manifest file) from the implementation function.
131129

132-
**Example:**
130+
One way to do this is to handle tool calls as events, as shown below:
133131

134-
```js
135-
// 1. Declare tool schema to agent.
136-
window.agent.provideContext({
137-
tools: [
132+
```json
133+
// 1. manifest.json: Define tools declaratively. Exact syntax TBD.
134+
135+
{
136+
// .. other manifest fields ..
137+
"tools": [
138138
{
139-
name: "add-todo",
140-
description: "Add a new todo item to the list",
141-
params: {
142-
type: "object",
143-
properties: {
144-
text: { type: "string", description: "The text of the todo item" }
139+
"name": "add-todo",
140+
"description": "Add a new todo item to the list",
141+
"inputSchema": {
142+
"type": "object",
143+
"properties": {
144+
"text": { "type": "string", "description": "The text of the todo item" }
145145
},
146-
required: ["text"]
146+
"required": ["text"]
147147
},
148148
}
149149
]
150-
});
150+
}
151+
```
152+
153+
```js
154+
// 2. script.js: Handle tool calls as events.
151155

152-
// 2. Handle tool calls as events.
153156
window.agent.addEventListener('toolcall', async e => {
154157
if (e.name === "add-todo") {
155158
// Add todo item and update UI.
@@ -158,17 +161,21 @@ window.agent.addEventListener('toolcall', async e => {
158161
} // etc...
159162
});
160163
```
164+
161165
Tool calls are handled as events. Since event handler functions can't respond to the agent by returning a value directly, the `'toolcall'` event object has a `respondWith()` method that needs to be called to signal completion and respond to the agent. This is based on the existing service worker `'fetch'` event.
162166

163167
**Advantages:**
164168

165-
- Allows multiple different discovery mechanisms.
169+
- Allows additional context different discovery mechanisms without rendering a page.
166170

167171
**Disadvantages:**
168172

169173
- Slightly harder to keep definition and implementation in sync.
174+
- Potentially large switch-case in event handler.
175+
176+
#### Recommendation
170177

171-
Although this API is slightly more complex than the former, support for declaring tools outside of JavaScript will likely be important to support agents' ability to discover tools without needing to navigate to a page. It is also simple to write a wrapper around the Option 2 API that makes it look like Option 1, which could be useful for sites that don't want to take advantage of the declarative approach.
178+
A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error.
172179

173180
#### Other API alternatives considered
174181

@@ -201,7 +208,7 @@ The page shows the stamps currently in the database and has a form to add a new
201208

202209
sing the Web Model Context API, the author can add just a few simple tools to the page for adding, updating, and retrieving stamps. With these relatively simple tools, an AI agent would have the ability to perform complex tasks like the ones illustrated above on behalf of the user.
203210

204-
The example below walks through adding one such tool, the "add-stamp" tool, using Option #2 of the Web Model Context API, so that AI agents can update the stamp collection.
211+
The example below walks through adding one such tool, the "add-stamp" tool, using the Web Model Context API, so that AI agents can update the stamp collection.
205212

206213
The webpage today is designed with a visual UX in mind. It uses simple JavaScript with a `'submit'` event handler that reads the form fields, adds the new record, and refreshes the UI:
207214

@@ -236,47 +243,49 @@ function addStamp(stampName, stampDescription, stampYear, stampImageUrl) {
236243
}
237244
```
238245

239-
To let AI agents use this functionality, the author first defines the available tools. The `agent` property on the `Window` is checked to ensure the browser supports Web Model Context. If supported, the `provideContext()` method is called, passing in an array of tools with a single item, a definition for the new "Add Stamp" tool. The tool accepts as parameters the same set of fields that are present in the HTML form, since this tool and the form should be functionally equivalent.
246+
To let AI agents use this functionality, the author defines the available tools. The `agent` property on the `Window` is checked to ensure the browser supports Web Model Context. If supported, the `provideContext()` method is called, passing in an array of tools with a single item, a definition for the new "Add Stamp" tool. The tool accepts as parameters the same set of fields that are present in the HTML form, since this tool and the form should be functionally equivalent.
240247

241248
```js
242-
window.agent.provideContext({
243-
tools: [
244-
{
245-
name: "add-stamp",
246-
description: "Add a new stamp to the collection",
247-
params: {
248-
type: "object",
249-
properties: {
250-
name: { type: "string", description: "The name of the stamp" },
251-
description: { type: "string", description: "A brief description of the stamp" },
252-
year: { type: "number", description: "The year the stamp was issued" },
253-
imageUrl: { type: "string", description: "An optional image URL for the stamp" }
249+
if ("agent" in window) {
250+
window.agent.provideContext({
251+
tools: [
252+
{
253+
name: "add-stamp",
254+
description: "Add a new stamp to the collection",
255+
inputSchema: {
256+
type: "object",
257+
properties: {
258+
name: { type: "string", description: "The name of the stamp" },
259+
description: { type: "string", description: "A brief description of the stamp" },
260+
year: { type: "number", description: "The year the stamp was issued" },
261+
imageUrl: { type: "string", description: "An optional image URL for the stamp" }
262+
},
263+
required: ["name", "description", "year"]
254264
},
255-
required: ["name", "description", "year"]
256-
},
257-
}
258-
]
259-
});
265+
async execute({ name, description, year, imageUrl }) {
266+
// TODO
267+
}
268+
}
269+
]
270+
});
271+
}
260272
```
261273

262-
Now the author needs to handle tool calls coming from connected agents. After defining the "Add Stamp" tool above, the author handles the `'toolcall'` event and implements the tools operations. The tool needs to update the stamp database, and refresh the UI to reflect the change to the database. Since the code to do this is already available in the `addStamp()` function written earlier, the event handler is very simple and just needs to call this helper when an "add-stamp" tool call is received. After calling the helper, the event handler needs to signal completion and should also provide some sort of feedback to the client application that requested the tool call. It calls `e.respondWith()` with a text message indicating the stamp was added:
274+
Now the author needs to implement the tool. The tool needs to update the stamp database, and refresh the UI to reflect the change to the database. Since the code to do this is already available in the `addStamp()` function written earlier, the tool implementation is very simple and just needs to call this helper when an "add-stamp" tool call is received. After calling the helper, the tool needs to signal completion and should also provide some sort of feedback to the client application that requested the tool call. It returns a text message indicating the stamp was added:
263275

264276
```js
265-
window.addEventListener('toolcall', async (e) => {
266-
if (e.name === 'add-stamp') {
267-
const { name, description, year, imageUrl } = e.input;
268-
addStamp(name, description, year, imageUrl);
269-
270-
return e.respondWith({
271-
content: [
272-
{
273-
type: "text",
274-
text: `Stamp "${name}" added successfully! The collection now contains ${stamps.length} stamps.`,
275-
},
276-
]
277-
});
278-
}
279-
});
277+
async execute({ name, description, year, imageUrl }) {
278+
addStamp(name, description, year, imageUrl);
279+
280+
return {
281+
content: [
282+
{
283+
type: "text",
284+
text: `Stamp "${name}" added successfully! The collection now contains ${stamps.length} stamps.`,
285+
},
286+
]
287+
};
288+
}
280289
```
281290
### Future improvements to this example
282291

0 commit comments

Comments
 (0)