Skip to content

Commit 0a48ff0

Browse files
authored
Markdown Conversion documentation (#20831)
* toMarkdown * Markdown conversion * no D1 here * fixes path * better wording * toMarkdown changelog updates catalog * changes date * better wording --------- Co-authored-by: Celso Martinho <[email protected]>
1 parent 621eac0 commit 0a48ff0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2064
-370
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Markdown conversion in Workers AI
3+
description: You can now convert documents in multiple formats to Markdown using the toMarkdown utility method in Workers AI.
4+
date: 2025-03-20T18:00:00Z
5+
---
6+
7+
Document conversion plays an important role when designing and developing AI applications and agents. Workers AI now provides the `toMarkdown` utility method that developers can use to for quick, easy, and convenient conversion and summary of documents in multiple formats to Markdown language.
8+
9+
You can call this new tool using a binding by calling `env.AI.toMarkdown()` or the using the [REST API](/api/resources/ai/) endpoint.
10+
11+
In this example, we fetch a PDF document and an image from R2 and feed them both to `env.AI.toMarkdown()`. The result is a list of converted documents. Workers AI models are used automatically to detect and summarize the image.
12+
13+
```typescript
14+
import { Env } from "./env";
15+
16+
export default {
17+
async fetch(request: Request, env: Env, ctx: ExecutionContext) {
18+
19+
// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf
20+
const pdf = await env.R2.get('somatosensory.pdf');
21+
22+
// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg
23+
const cat = await env.R2.get('cat.jpeg');
24+
25+
return Response.json(
26+
await env.AI.toMarkdown([
27+
{
28+
name: "somatosensory.pdf",
29+
blob: new Blob([await pdf.arrayBuffer()], { type: "application/octet-stream" }),
30+
},
31+
{
32+
name: "cat.jpeg",
33+
blob: new Blob([await cat.arrayBuffer()], { type: "application/octet-stream" }),
34+
},
35+
]),
36+
);
37+
},
38+
};
39+
```
40+
41+
This is the result:
42+
43+
```json
44+
[
45+
{
46+
"name": "somatosensory.pdf",
47+
"mimeType": "application/pdf",
48+
"format": "markdown",
49+
"tokens": 0,
50+
"data": "# somatosensory.pdf\n## Metadata\n- PDFFormatVersion=1.4\n- IsLinearized=false\n- IsAcroFormPresent=false\n- IsXFAPresent=false\n- IsCollectionPresent=false\n- IsSignaturesPresent=false\n- Producer=Prince 20150210 (www.princexml.com)\n- Title=Anatomy of the Somatosensory System\n\n## Contents\n### Page 1\nThis is a sample document to showcase..."
51+
},
52+
{
53+
"name": "cat.jpeg",
54+
"mimeType": "image/jpeg",
55+
"format": "markdown",
56+
"tokens": 0,
57+
"data": "The image is a close-up photograph of Grumpy Cat, a cat with a distinctive grumpy expression and piercing blue eyes. The cat has a brown face with a white stripe down its nose, and its ears are pointed upright. Its fur is light brown and darker around the face, with a pink nose and mouth. The cat's eyes are blue and slanted downward, giving it a perpetually grumpy appearance. The background is blurred, but it appears to be a dark brown color. Overall, the image is a humorous and iconic representation of the popular internet meme character, Grumpy Cat. The cat's facial expression and posture convey a sense of displeasure or annoyance, making it a relatable and entertaining image for many people."
58+
}
59+
]
60+
```
61+
62+
See [Markdown Conversion](/workers-ai/markdown-conversion/) for more information on supported formats, REST API and pricing.
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
---
2+
title: Markdown Conversion
3+
pcx_content_type: how-to
4+
sidebar:
5+
order: 5
6+
badge:
7+
text: Beta
8+
---
9+
10+
import { Code, Type, MetaInfo, Details } from "~/components";
11+
12+
[Markdown](https://en.wikipedia.org/wiki/Markdown) is essential for text generation and large language models (LLMs) in training and inference because it can provide structured, semantic, human, and machine-readable input. Likewise, Markdown facilitates chunking and structuring input data for better retrieval and synthesis in the context of RAGs, and its simplicity and ease of parsing and rendering make it ideal for AI Agents.
13+
14+
For these reasons, document conversion plays an important role when designing and developing AI applications. Workers AI provides the `toMarkdown` utility method that developers can use from the [`env.AI`](/workers-ai/configuration/bindings/) binding or the REST APIs for quick, easy, and convenient conversion and summary of documents in multiple formats to Markdown language.
15+
16+
## Methods and definitions
17+
18+
### async env.AI.toMarkdown()
19+
20+
Takes a list of documents in different formats and converts them to Markdown.
21+
22+
#### Parameter
23+
24+
- <code>documents</code>: <Type text="array"/>
25+
- An array of `toMarkdownDocument`s.
26+
27+
#### Return values
28+
29+
- <code>results</code>: <Type text="array"/>
30+
- An array of `toMarkdownDocumentResult`s.
31+
32+
### `toMarkdownDocument` definition
33+
34+
- `name` <Type text="string" />
35+
36+
- Name of the document to convert.
37+
38+
- `blob` <Type text="Blob" />
39+
40+
- A new [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob) object with the document content.
41+
42+
### `toMarkdownDocumentResult` definition
43+
44+
- `name` <Type text="string" />
45+
46+
- Name of the converted document. Matches the input name.
47+
48+
- `mimetype` <Type text="string" />
49+
50+
- The detected [mime type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types/Common_types) of the document.
51+
52+
- `tokens` <Type text="number" />
53+
54+
- The estimated number of tokens of the converted document.
55+
56+
- `data` <Type text="string" />
57+
58+
- The content of the converted document in Markdown format.
59+
60+
## Supported formats
61+
62+
This is the list of support formats. We are constantly adding new formats and updating this table.
63+
64+
<table>
65+
<tbody>
66+
<th colspan="5" rowspan="1" style="width:160px">
67+
Format
68+
</th>
69+
<th colspan="5" rowspan="1">
70+
File extensions
71+
</th>
72+
<th colspan="5" rowspan="1">
73+
Mime Types
74+
</th>
75+
<tr>
76+
<td colspan="5" rowspan="1">
77+
PDF Documents
78+
</td>
79+
<td colspan="5" rowspan="1">
80+
`.pdf`
81+
</td>
82+
<td colspan="5" rowspan="1">
83+
`application/pdf`
84+
</td>
85+
</tr>
86+
<tr>
87+
<td colspan="5" rowspan="1">
88+
Images <sup>1</sup>
89+
</td>
90+
<td colspan="5" rowspan="1">
91+
`.jpeg`, `.jpg`, `.png`, `.webp`, `.svg`
92+
</td>
93+
<td colspan="5" rowspan="1">
94+
`image/jpeg`, `image/png`, `image/webp`, `image/svg+xml`
95+
</td>
96+
</tr>
97+
<tr>
98+
<td colspan="5" rowspan="1">
99+
HTML Documents
100+
</td>
101+
<td colspan="5" rowspan="1">
102+
`.html`
103+
</td>
104+
<td colspan="5" rowspan="1">
105+
`text/html`
106+
</td>
107+
</tr>
108+
<tr>
109+
<td colspan="5" rowspan="1">
110+
XML Documents
111+
</td>
112+
<td colspan="5" rowspan="1">
113+
`.xml`
114+
</td>
115+
<td colspan="5" rowspan="1">
116+
`application/xml`
117+
</td>
118+
</tr>
119+
<tr>
120+
<td colspan="5" rowspan="1">
121+
Microsoft Office Documents
122+
</td>
123+
<td colspan="5" rowspan="1">
124+
`.xlsx`, `.xlsm`, `.xlsb`, `.xls`, `.et`
125+
</td>
126+
<td colspan="5" rowspan="1">
127+
`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`, `application/vnd.ms-excel.sheet.macroenabled.12`, `application/vnd.ms-excel.sheet.binary.macroenabled.12`, `application/vnd.ms-excel`, `application/vnd.ms-excel`
128+
</td>
129+
</tr>
130+
<tr>
131+
<td colspan="5" rowspan="1">
132+
Open Document Format
133+
</td>
134+
<td colspan="5" rowspan="1">
135+
`.ods`
136+
</td>
137+
<td colspan="5" rowspan="1">
138+
`application/vnd.oasis.opendocument.spreadsheet`
139+
</td>
140+
</tr>
141+
<tr>
142+
<td colspan="5" rowspan="1">
143+
CSV
144+
</td>
145+
<td colspan="5" rowspan="1">
146+
`.csv`
147+
</td>
148+
<td colspan="5" rowspan="1">
149+
`text/csv`
150+
</td>
151+
</tr>
152+
<tr>
153+
<td colspan="5" rowspan="1">
154+
Apple Documents
155+
</td>
156+
<td colspan="5" rowspan="1">
157+
`.numbers`
158+
</td>
159+
<td colspan="5" rowspan="1">
160+
`application/vnd.apple.numbers`
161+
</td>
162+
</tr>
163+
</tbody>
164+
</table>
165+
166+
<sup>1</sup> Image conversion uses two Workers AI models for object detection and summarization. See [pricing](/workers-ai/markdown-conversion/#pricing) for more details.
167+
168+
## Example
169+
170+
In this example, we fetch a PDF document and an image from R2 and feed them both to `env.AI.toMarkdown`. The result is a list of converted documents. Workers AI models are used automatically to detect and summarize the image.
171+
172+
```typescript
173+
import { Env } from "./env";
174+
175+
export default {
176+
async fetch(request: Request, env: Env, ctx: ExecutionContext) {
177+
178+
// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf
179+
const pdf = await env.R2.get('somatosensory.pdf');
180+
181+
// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg
182+
const cat = await env.R2.get('cat.jpeg');
183+
184+
return Response.json(
185+
await env.AI.toMarkdown([
186+
{
187+
name: "somatosensory.pdf",
188+
blob: new Blob([await pdf.arrayBuffer()], { type: "application/octet-stream" }),
189+
},
190+
{
191+
name: "cat.jpeg",
192+
blob: new Blob([await cat.arrayBuffer()], { type: "application/octet-stream" }),
193+
},
194+
]),
195+
);
196+
},
197+
};
198+
```
199+
200+
This is the result:
201+
202+
```json
203+
[
204+
{
205+
"name": "somatosensory.pdf",
206+
"mimeType": "application/pdf",
207+
"format": "markdown",
208+
"tokens": 0,
209+
"data": "# somatosensory.pdf\n## Metadata\n- PDFFormatVersion=1.4\n- IsLinearized=false\n- IsAcroFormPresent=false\n- IsXFAPresent=false\n- IsCollectionPresent=false\n- IsSignaturesPresent=false\n- Producer=Prince 20150210 (www.princexml.com)\n- Title=Anatomy of the Somatosensory System\n\n## Contents\n### Page 1\nThis is a sample document to showcase..."
210+
},
211+
{
212+
"name": "cat.jpeg",
213+
"mimeType": "image/jpeg",
214+
"format": "markdown",
215+
"tokens": 0,
216+
"data": "The image is a close-up photograph of Grumpy Cat, a cat with a distinctive grumpy expression and piercing blue eyes. The cat has a brown face with a white stripe down its nose, and its ears are pointed upright. Its fur is light brown and darker around the face, with a pink nose and mouth. The cat's eyes are blue and slanted downward, giving it a perpetually grumpy appearance. The background is blurred, but it appears to be a dark brown color. Overall, the image is a humorous and iconic representation of the popular internet meme character, Grumpy Cat. The cat's facial expression and posture convey a sense of displeasure or annoyance, making it a relatable and entertaining image for many people."
217+
}
218+
]
219+
```
220+
221+
## REST API
222+
223+
In addition to the Workers AI [binding](/workers-ai/configuration/bindings/), you can use the [REST API](/workers-ai/get-started/rest-api/):
224+
225+
```bash
226+
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \
227+
-H 'Authorization: Bearer {API_TOKEN}' \
228+
229+
230+
```
231+
232+
## Pricing
233+
234+
`toMarkdown` is free for most format conversions. In some cases, like image conversion, it can use Workers AI models for object detection and summarization, which may incur additional costs if it exceeds the Workers AI free allocation limits. See the [pricing page](/workers-ai/platform/pricing/) for more details.

src/content/workers-ai-models/deepseek-coder-6.7b-base-awq.json

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,26 @@
3333
"prompt": {
3434
"type": "string",
3535
"minLength": 1,
36-
"maxLength": 131072,
3736
"description": "The input text prompt for the model to generate a response."
3837
},
38+
"lora": {
39+
"type": "string",
40+
"description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model."
41+
},
42+
"response_format": {
43+
"title": "JSON Mode",
44+
"type": "object",
45+
"properties": {
46+
"type": {
47+
"type": "string",
48+
"enum": [
49+
"json_object",
50+
"json_schema"
51+
]
52+
},
53+
"json_schema": {}
54+
}
55+
},
3956
"raw": {
4057
"type": "boolean",
4158
"default": false,
@@ -93,10 +110,6 @@
93110
"minimum": 0,
94111
"maximum": 2,
95112
"description": "Increases the likelihood of the model introducing new topics."
96-
},
97-
"lora": {
98-
"type": "string",
99-
"description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model."
100113
}
101114
},
102115
"required": [
@@ -118,7 +131,6 @@
118131
},
119132
"content": {
120133
"type": "string",
121-
"maxLength": 131072,
122134
"description": "The content of the message as a string."
123135
}
124136
},
@@ -287,10 +299,29 @@
287299
]
288300
}
289301
},
302+
"response_format": {
303+
"title": "JSON Mode",
304+
"type": "object",
305+
"properties": {
306+
"type": {
307+
"type": "string",
308+
"enum": [
309+
"json_object",
310+
"json_schema"
311+
]
312+
},
313+
"json_schema": {}
314+
}
315+
},
316+
"raw": {
317+
"type": "boolean",
318+
"default": false,
319+
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
320+
},
290321
"stream": {
291322
"type": "boolean",
292323
"default": false,
293-
"description": "If true, the response will be streamed back incrementally."
324+
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."
294325
},
295326
"max_tokens": {
296327
"type": "integer",
@@ -308,7 +339,7 @@
308339
"type": "number",
309340
"minimum": 0,
310341
"maximum": 2,
311-
"description": "Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
342+
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
312343
},
313344
"top_k": {
314345
"type": "integer",
@@ -395,7 +426,10 @@
395426
}
396427
}
397428
}
398-
}
429+
},
430+
"required": [
431+
"response"
432+
]
399433
},
400434
{
401435
"type": "string",

0 commit comments

Comments
 (0)