You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -137,60 +137,6 @@ const result = await multiUserSession.prompt([
137
137
138
138
Because of their special behavior of being preserved on context window overflow, system prompts cannot be provided this way.
139
139
140
-
### Tool use
141
-
142
-
The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model.
143
-
144
-
Here’s an example of how to use the `tools` option:
145
-
146
-
```js
147
-
constsession=awaitLanguageModel.create({
148
-
initialPrompts: [
149
-
{
150
-
role:"system",
151
-
content:`You are a helpful assistant. You can use tools to help the user.`
152
-
}
153
-
],
154
-
tools: [
155
-
{
156
-
name:"getWeather",
157
-
description:"Get the weather in a location.",
158
-
inputSchema: {
159
-
type:"object",
160
-
properties: {
161
-
location: {
162
-
type:"string",
163
-
description:"The city to check for the weather condition.",
constresult=awaitsession.prompt("What is the weather in Seattle?");
178
-
```
179
-
180
-
In this example, the `tools` array defines a `getWeather` tool, specifying its name, description, input schema, and `execute` implementation. When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response.
181
-
182
-
#### Concurrent tool use
183
-
184
-
Developers should be aware that the model might call their tool multiple times, concurrently. For example, code such as
185
-
186
-
```js
187
-
constresult=awaitsession.prompt("Which of these locations currently has the highest temperature? Seattle, Tokyo, Berlin");
188
-
```
189
-
190
-
might call the above `"getWeather"` tool's `execute()` function three times. The model would wait for all tool call results to return, using the equivalent of `Promise.all()` internally, before it composes its final response.
191
-
192
-
Similarly, the model might call multiple different tools, if it believes they all are relevant when responding to the given prompt.
193
-
194
140
### Multimodal inputs
195
141
196
142
All of the above examples have been of text prompts. Some language models also support other inputs. Our design initially includes the potential to support images and audio clips as inputs. This is done by using objects in the form `{ type: "image", content }` and `{ type: "audio", content }` instead of strings. The `content` values can be the following:
@@ -281,6 +227,113 @@ Details:
281
227
282
228
Future extensions may include more ambitious multimodal inputs, such as video clips, or realtime audio or video. (Realtime might require a different API design, more based around events or streams instead of messages.)
283
229
230
+
### Tool use
231
+
232
+
The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model.
233
+
234
+
Here’s an example of how to use the `tools` option:
235
+
236
+
```js
237
+
constsession=awaitLanguageModel.create({
238
+
initialPrompts: [
239
+
{
240
+
role:"system",
241
+
content:`You are a helpful assistant. You can use tools to help the user.`
242
+
}
243
+
],
244
+
tools: [
245
+
{
246
+
name:"getWeather",
247
+
description:"Get the weather in a location.",
248
+
inputSchema: {
249
+
type:"object",
250
+
properties: {
251
+
location: {
252
+
type:"string",
253
+
description:"The city to check for the weather condition.",
constresult=awaitsession.prompt("What is the weather in Seattle?");
268
+
```
269
+
270
+
In this example, the `tools` array defines a `getWeather` tool, specifying its name, description, input schema, and `execute` implementation. When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response.
271
+
272
+
#### Concurrent tool use
273
+
274
+
Developers should be aware that the model might call their tool multiple times, concurrently. For example, code such as
275
+
276
+
```js
277
+
constresult=awaitsession.prompt("Which of these locations currently has the highest temperature? Seattle, Tokyo, Berlin");
278
+
```
279
+
280
+
might call the above `"getWeather"` tool's `execute()` function three times. The model would wait for all tool call results to return, using the equivalent of `Promise.all()` internally, before it composes its final response.
281
+
282
+
Similarly, the model might call multiple different tools, if it believes they all are relevant when responding to the given prompt.
283
+
284
+
#### Tool return values
285
+
286
+
The above example shows tools returning a string. (In fact, stringified JSON.) Models which support [multimodal inputs](#multimodal-inputs) might also support interpreting image or audio results from tool calls.
287
+
288
+
Just like the `content` option to a `prompt()` call can accept either a string or an array of `{ type, value }` objects, web developer-provided tools can return either a string or such an array. Here's an example:
289
+
290
+
```js
291
+
let mutex, resolveMutex;
292
+
293
+
constsession=awaitLanguageModel.create({
294
+
tools: [
295
+
{
296
+
name:"grabKeyframe",
297
+
description:"Grab a keyframe from the video we're analyzing at the given time",
298
+
inputSchema: {
299
+
type:"number",
300
+
minimum:0,
301
+
exclusiveMaximum:videoEl.duration
302
+
},
303
+
expectedOutputs: {
304
+
types: ["image"]
305
+
},
306
+
asyncexecute(timestamp) {
307
+
if (mutex) {
308
+
// Since we're seeking a single video element, guard against concurrent calls.
309
+
await mutex;
310
+
}
311
+
try {
312
+
mutex =newPromise(r=> resolveMutex = r);
313
+
314
+
if (Math.abs(videoEl.currentTime- timestamp) >0.001) {
Note how the output types need to be specified in the tool definition, so that session creation can fail early if the model doesn't support processing multimodal tool outputs. If the return value contains non-text components without them being present in the tool specification, then the tool call will fail at prompting time, even if the model could support it.
332
+
333
+
Similarly, expected output languages can be provided (via `expectedOutputs: { languages: ["ja" ] }`) or similar, to get an early failure if the model doesn't support processing tool outputs in those languages. However, unlike modalities, there is no prompt-time checking of the tool call result's languages.
334
+
335
+
The above example shows a single-item array, but just like with prompt inputs, it's allowed to include multiple tool outputs. The same rules are followed as for inputs, e.g., concatenation of adjacent text chunks is done with a single space character.
336
+
284
337
### Structured output with JSON schema or RegExp constraints
285
338
286
339
To help with programmatic processing of language model responses, the prompt API supports constraining the response with either a JSON schema object or a `RegExp` passed as the `responseConstraint` option:
// The function to be invoked by user agent on behalf of language model.
@@ -135,14 +136,17 @@ typedef (
135
136
136
137
dictionary LanguageModelMessage {
137
138
required LanguageModelMessageRole role;
138
-
139
-
// The DOMString branch is shorthand for `[{ type: "text", value: providedValue }]`
140
-
required (DOMString or sequence<LanguageModelMessageContent>) content;
141
-
139
+
required LanguageModelMessageContent content;
142
140
boolean prefix = false;
143
141
};
144
142
145
-
dictionary LanguageModelMessageContent {
143
+
typedef (
144
+
sequence<LanguageModelMessageContentChunk>
145
+
// Shorthand for `[{ type: "text", value: providedValue }]`
146
+
or DOMString
147
+
) LanguageModelMessageContent;
148
+
149
+
dictionary LanguageModelMessageContentChunk {
146
150
required LanguageModelMessageType type;
147
151
required LanguageModelMessageValue value;
148
152
};
@@ -164,7 +168,8 @@ typedef (
164
168
<p class="note">This will be incorporated into a proper part of the specification later. For now, we're just writing out this algorithm as a full spec, since it's complicated.</p>
165
169
166
170
<div algorithm>
167
-
To <dfn>validate and canonicalize a prompt</dfn> given a {{LanguageModelPrompt}} |input|, a [=list=] of {{LanguageModelMessageType}}s |expectedTypes|, and a boolean |isInitial|, perform the following steps. The return value will be a non-empty [=list=] of {{LanguageModelMessage}}s in their "longhand" form.
171
+
<!-- TODO remove noexport once there are actual references to this algorithm in the spec. It is only being used now to silence a build warning. -->
172
+
To <dfn noexport>validate and canonicalize a prompt</dfn> given a {{LanguageModelPrompt}} |input|, a [=list=] of {{LanguageModelMessageType}}s |expectedTypes|, and a boolean |isInitial|, perform the following steps. The return value will be a non-empty [=list=] of {{LanguageModelMessage}}s in their "longhand" form.
1. If |message|["{{LanguageModelMessage/role}}"] is not "{{LanguageModelMessageRole/system}}", then set |seenNonSystemRole| to true.
220
225
221
-
1. If |message|["{{LanguageModelMessage/role}}"] is "{{LanguageModelMessageRole/assistant}}" and |content|["{{LanguageModelMessageContent/type}}"] is not "{{LanguageModelMessageType/text}}", then throw a "{{NotSupportedError}}" {{DOMException}}.
226
+
1. If |message|["{{LanguageModelMessage/role}}"] is "{{LanguageModelMessageRole/assistant}}" and |content|["{{LanguageModelMessageContentChunk/type}}"] is not "{{LanguageModelMessageType/text}}", then throw a "{{NotSupportedError}}" {{DOMException}}.
222
227
223
-
1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/text}}" and |content|["{{LanguageModelMessageContent/value}}"] is not a [=string=], then throw a {{TypeError}}.
228
+
1. If |content|["{{LanguageModelMessageContentChunk/type}}"] is "{{LanguageModelMessageType/text}}" and |content|["{{LanguageModelMessageContentChunk/value}}"] is not a [=string=], then throw a {{TypeError}}.
224
229
225
-
1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/image}}", then:
230
+
1. If |content|["{{LanguageModelMessageContentChunk/type}}"] is "{{LanguageModelMessageType/image}}", then:
226
231
227
232
1. If |expectedTypes| does not [=list/contain=] "{{LanguageModelMessageType/image}}", then throw a "{{NotSupportedError}}" {{DOMException}}.
228
233
229
-
1. If |content|["{{LanguageModelMessageContent/value}}"] is not an {{ImageBitmapSource}} or {{BufferSource}}, then throw a {{TypeError}}.
234
+
1. If |content|["{{LanguageModelMessageContentChunk/value}}"] is not an {{ImageBitmapSource}} or {{BufferSource}}, then throw a {{TypeError}}.
230
235
231
-
1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/audio}}", then:
236
+
1. If |content|["{{LanguageModelMessageContentChunk/type}}"] is "{{LanguageModelMessageType/audio}}", then:
232
237
233
238
1. If |expectedTypes| does not [=list/contain=] "{{LanguageModelMessageType/audio}}", then throw a "{{NotSupportedError}}" {{DOMException}}.
234
239
235
-
1. If |content|["{{LanguageModelMessageContent/value}}"] is not an {{AudioBuffer}}, {{BufferSource}}, or {{Blob}}, then throw a {{TypeError}}.
240
+
1. If |content|["{{LanguageModelMessageContentChunk/value}}"] is not an {{AudioBuffer}}, {{BufferSource}}, or {{Blob}}, then throw a {{TypeError}}.
236
241
237
242
1. Let |contentWithContiguousTextCollapsed| be an empty [=list=] of {{LanguageModelMessageContent}}s.
238
243
239
244
1. Let |lastTextContent| be null.
240
245
241
246
1. [=list/For each=] |content| of |message|["{{LanguageModelMessage/content}}"]:
242
247
243
-
1. If |content|["{{LanguageModelMessageContent/type}}"] is "{{LanguageModelMessageType/text}}":
248
+
1. If |content|["{{LanguageModelMessageContentChunk/type}}"] is "{{LanguageModelMessageType/text}}":
244
249
245
250
1. If |lastTextContent| is null:
246
251
247
252
1. [=list/Append=] |content| to |contentWithContiguousTextCollapsed|.
248
253
249
254
1. Set |lastTextContent| to |content|.
250
255
251
-
1. Otherwise, set |lastTextContent|["{{LanguageModelMessageContent/value}}"] to the concatenation of |lastTextContent|["{{LanguageModelMessageContent/value}}"] and |content|["{{LanguageModelMessageContent/value}}"].
256
+
1. Otherwise, set |lastTextContent|["{{LanguageModelMessageContentChunk/value}}"] to the concatenation of |lastTextContent|["{{LanguageModelMessageContentChunk/value}}"] and |content|["{{LanguageModelMessageContentChunk/value}}"].
252
257
253
-
<p class="note">No space or other character is added. Thus, « «[ "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}", "`foo`" ]», «[ "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}", "`bar`" ]» » is canonicalized to « «[ "{{LanguageModelMessageContent/type}}" → "{{LanguageModelMessageType/text}}", "`foobar`" ]».</p>
258
+
<p class="note">No space or other character is added. Thus, « «[ "{{LanguageModelMessageContentChunk/type}}" → "{{LanguageModelMessageType/text}}", "`foo`" ]», «[ "{{LanguageModelMessageContentChunk/type}}" → "{{LanguageModelMessageType/text}}", "`bar`" ]» » is canonicalized to « «[ "{{LanguageModelMessageContentChunk/type}}" → "{{LanguageModelMessageType/text}}", "`foobar`" ]».</p>
0 commit comments