Skip to content

Commit c3bb82f

Browse files
KianNHkodster28
authored andcommitted
[Style Guide] AI consumability in How we docs (#23673)
* [Style Guide] AI consumability in How we docs * unused imports * ignore index.md/markdown.zip links * links * image & tokens comparison * Update src/content/docs/style-guide/how-we-docs/ai-consumability.mdx Co-authored-by: Kody Jackson <[email protected]> --------- Co-authored-by: Kody Jackson <[email protected]>
1 parent 98f8246 commit c3bb82f

File tree

3 files changed

+96
-0
lines changed

3 files changed

+96
-0
lines changed

astro.config.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,10 @@ export default defineConfig({
147147
"/workers/examples/?languages=*",
148148
"/workers/examples/?tags=*",
149149
"/workers-ai/models/**",
150+
"**index.md",
151+
"/markdown.zip",
152+
"/style-guide/index.md",
153+
"/style-guide/fixtures/markdown/index.md",
150154
],
151155
}),
152156
]
49.7 KB
Loading
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
pcx_content_type: how-to
3+
title: AI consumability
4+
meta:
5+
title: AI consumability | How we docs
6+
---
7+
8+
import { Tabs, TabItem, Width } from "~/components";
9+
10+
We have various approaches for making our content visible to AI as well as making sure it's easily consumed in a plain-text format.
11+
12+
## AI discoverability
13+
14+
The primary proposal in this space is [`llms.txt`](https://llmstxt.org/), offering a well-known path for a Markdown list of all your pages.
15+
16+
We have implemented `llms.txt`, `llms-full.txt` and also created per-page Markdown links as follows:
17+
18+
- [`llms.txt`](/llms.txt)
19+
- [`llms-full.txt`](/llms-full.txt)
20+
- We also provide a `llms-full.txt` file on a per-product basis, i.e [`/workers/llms-full.txt`](/workers/llms-full.txt)
21+
- [`/$page/index.md`](index.md)
22+
- Add `/index.md` to the end of any page to get the Markdown version, i.e [`/style-guide/index.md`](/style-guide/index.md)
23+
- [`/markdown.zip`](/markdown.zip)
24+
- An export of all of our documentation in the aforementioned `index.md` format.
25+
26+
In the top right of this page, you will see a `Page options` button where you can copy the current page as Markdown that can be given to your LLM of choice.
27+
28+
<Width size="medium">
29+
![Page options
30+
button](~/assets/images/style-guide/how-we-docs/page-options.png)
31+
</Width>
32+
33+
## Textual representation of interactive elements
34+
35+
HTML is easily parsed - after all, the browser has to parse it to decide how to render the page you're reading now - it tends to not be very _portable_. This limitation is especially painful in an AI context, because all the extra presentation information consumes additional tokens.
36+
37+
For example, given our [`Tabs`](/style-guide/components/tabs/), the panels are hidden until the tab itself is clicked:
38+
39+
<Tabs>
40+
<TabItem label="One">One Content</TabItem>
41+
<TabItem label="Two">Two Content</TabItem>
42+
</Tabs>
43+
44+
If we run the resulting HTML from this component through a solution like [`turndown`](https://www.npmjs.com/package/turndown):
45+
46+
```md
47+
- [One](#tab-panel-6)
48+
- [Two](#tab-panel-7)
49+
50+
One Content
51+
52+
Two Content
53+
```
54+
55+
The references to the panels `id`, usually handled by JavaScript, are visible but non-functional.
56+
57+
### Turning our components into "Markdownable" HTML
58+
59+
To solve this, we created a [`rehype plugin`](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts) for:
60+
61+
- Removing non-content tags (`script`, `style`, `link`, etc) via a [tags allowlist](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts#L19-L104)
62+
- [Transforming custom elements](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts#L189-L227) like `starlight-tabs` into standard unordered lists
63+
- [Adapting our Expressive Code codeblocks HTML](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts#L143-L178) to the [HTML that CommonMark expects](https://spec.commonmark.org/0.31.2/#example-142)
64+
65+
Taking the `Tabs` example from the previous section and running it through our plugin will now give us a normal unordered list with the content properly associated with a given list item:
66+
67+
```md
68+
- One
69+
70+
One Content
71+
72+
- Two
73+
74+
Two Content
75+
```
76+
77+
For example, take a look at our Markdown test fixture (or any page by appending `/index.md` to the URL):
78+
79+
- [`/style-guide/fixtures/markdown/`](/style-guide/fixtures/markdown/)
80+
- [`/style-guide/fixtures/markdown/index.md`](/style-guide/fixtures/markdown/index.md)
81+
82+
### Saving on tokens
83+
84+
Most AI pricing is around input & output tokens and our approach greatly reduces the amount of input tokens required.
85+
86+
For example, let's take a look at the amount of tokens required for the [Workers Get Started](/workers/get-started/guide/) using [OpenAI's tokenizer](https://platform.openai.com/tokenizer):
87+
88+
- HTML: 15,229 tokens
89+
- turndown: 3,401 tokens (4.48x less than HTML)
90+
- index.md: 2,110 tokens (7.22x less than HTML)
91+
92+
When providing our content to AI, we can see a real-world ~7x saving in input tokens cost.

0 commit comments

Comments
 (0)