diff --git a/src/content/docs/workers-ai/configuration/bindings.mdx b/src/content/docs/workers-ai/configuration/bindings.mdx index bcc052d223d4655..efdd520e12bcb06 100644 --- a/src/content/docs/workers-ai/configuration/bindings.mdx +++ b/src/content/docs/workers-ai/configuration/bindings.mdx @@ -3,7 +3,6 @@ pcx_content_type: configuration title: Workers Bindings sidebar: order: 1 - --- import { Type, MetaInfo } from "~/components"; @@ -40,33 +39,75 @@ To configure a Workers AI binding in your Pages Function, you must use the Cloud `async env.AI.run()` runs a model. Takes a model as the first parameter, and an object as the second parameter. ```javascript -const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { - prompt: "What is the origin of the phrase 'Hello, World'" +const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", { + prompt: "What is the origin of the phrase 'Hello, World'", }); ``` -**Parameters** +```javascript +const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", { + prompt: "What is the origin of the phrase 'Hello, World'", + stream: true, +}); +return new Response(answer, { + headers: { "content-type": "text/event-stream" }, +}); +``` +**Parameters** -* `model` +- `model` - * The model to run. + - The model to run. **Supported options** - * `stream` - * Returns a stream of results as they are available. - - - -```javascript -const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { - prompt: "What is the origin of the phrase 'Hello, World'", - stream: true -}); - -return new Response(answer, { - headers: { "content-type": "text/event-stream" } -}); -``` + - `prompt` + - Text prompt for the text-generation (maxLength: 131072, minLength: 1). + - `raw` + - If true, a chat template is not applied and you must adhere to the specific model's expected formatting. + - `stream` + - If true, the response will be streamed back incrementally using SSE, Server Sent Events. + - `max_tokens` + - The maximum number of tokens to generate in the response. + - `temperature` + - Controls the randomness of the output; higher values produce more random results (maximum: 5, minimum: 0). + - `top_p` + - Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses (maximum: 2, minimum: 0). + - `top_k` + - Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises (maximum: 50, minimum: 1). + - `seed` + - Random seed for reproducibility of the generation (maximum: 9999999999, minimum: 1). + - `repetition_penalty` + - Penalty for repeated tokens; higher values discourage repetition (maximum: 2, minimum: 0). + - `frequency_penalty` + - Decreases the likelihood of the model repeating the same lines verbatim (maximum: 2, minimum: 0). + - `presence_penalty` + - Increases the likelihood of the model introducing new topics (maximum: 2, minimum: 0). + - `messages` \* An array of message objects representing the conversation history. + - `tools` \* A list of tools available for the assistant to use. + - `functions` \* A list of functions available for the assistant to use.