Skip to content

feat(public-docsite-v9): add llms docs#34838

Merged
dmytrokirpa merged 27 commits intomicrosoft:masterfrom
dmytrokirpa:feat/llm-docs
Aug 6, 2025
Merged

feat(public-docsite-v9): add llms docs#34838
dmytrokirpa merged 27 commits intomicrosoft:masterfrom
dmytrokirpa:feat/llm-docs

Conversation

@dmytrokirpa
Copy link
Copy Markdown
Contributor

@dmytrokirpa dmytrokirpa commented Jul 15, 2025

Previous Behavior

New Behavior

This PR introduces a new CLI tool that extracts documentation from Storybook builds and converts it to LLM-friendly formats following the llmstxt.org specification. The tool processes Storybook production builds to generate comprehensive documentation in plain text format optimized for Large Language Models.

Key Features

  • Component Documentation: Extracts props, descriptions, and type information from React components
  • Story Examples: Captures all story variations with complete source code
  • MDX Support: Processes MDX documentation pages and converts HTML to clean Markdown
  • Subcomponents: Handles complex components with subcomponents and their props
  • LLMs.txt Format: Generates summary files following the llmstxt.org specification
  • Static File Serving: Uses Playwright routing instead of Express for better reliability
  • Flexible Configuration: Supports CLI arguments and config files

Technical Implementation

  • Static File Routing: Uses Playwright's page.route() to serve Storybook files without needing a web server
  • Story Extraction: Accesses Storybook's internal story store (__STORYBOOK_PREVIEW__) for metadata
  • Content Processing: Converts HTML documentation to clean Markdown using Turndown with GitHub Flavored Markdown support
  • Storybook Compatibility: Supports both Storybook 7 (storyStore) and Storybook 8+ (storyStoreValue)

Output Structure

storybook-static/ 
├── llms.txt # Main summary file (llmstxt.org format) 
└── llms/ 
├── components-button.txt # Individual component docs 
├── components-accordion.txt 
└── concepts-introduction.txt # MDX page docs

Usage Examples

Basic Usage:

npx storybook-llms-extractor --distPath "storybook-static" --baseUrl "https://storybook.example.com"

# or with refs

npx storybook-llms-extractor \
  --distPath "storybook-static" \
  --baseUrl "https://main.storybook.dev" \
  --refs '{"title":"Charts","url":"https://charts.storybook.dev"}'

With Configuration File:

// storybook-llms.config.js
// @ts-check

/** @type {import('@fluentui/storybook-llms-extractor').Args}
module.exports = {
  distPath: 'storybook-static',
  baseUrl: 'https://react.fluentui.dev',
  summaryTitle: 'Fluent UI React v9',
  summaryDescription: 'Fluent UI React components documentation',
  refs: [
    { title: 'Charts v9', url: 'https://charts.fluentui.dev' }
  ]
};

Files Added

  • tools/storybook-llms-extractor/src/cli.ts - CLI entry point and argument processing
  • tools/storybook-llms-extractor/src/utils.ts - Core extraction and conversion logic
  • tools/storybook-llms-extractor/src/types.ts - TypeScript type definitions
  • tools/storybook-llms-extractor/src/index.ts - Package exports
  • tools/storybook-llms-extractor/src/utils.spec.ts - Unit tests
  • tools/storybook-llms-extractor/src/__fixtures__/ - Test fixtures
  • tools/storybook-llms-extractor/README.md - Comprehensive documentation

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 15, 2025

📊 Bundle size report

✅ No changes found

@github-actions github-actions bot added the CI label Jul 15, 2025
@github-actions
Copy link
Copy Markdown

Pull request demo site: URL

@tudorpopams
Copy link
Copy Markdown
Contributor

This is awesome! Can we apply the same pattern to composed stories as well? It would be really cool to get this done for charts and contrib as well.

@dmytrokirpa
Copy link
Copy Markdown
Contributor Author

This is awesome! Can we apply the same pattern to composed stories as well? It would be really cool to get this done for charts and contrib as well.

https://fluentuipr.z22.web.core.windows.net/pull/34838/public-docsite-v9/storybook/llms.txt - v9 llms.txt
https://fluentuipr.z22.web.core.windows.net/pull/34838/chart-docsite/storybook/llms.txt - charts llms.txt

Contrib isn't ready yet since it's not in the monorepo, and I'm still figuring out the optimal way to distribute the script if we'll decide to go with it.

@Hotell
Copy link
Copy Markdown
Contributor

Hotell commented Jul 17, 2025

distribution:

i don't see how this could possible work as SB addon or bundler plugin because how this works under the hood.

it's very similar to what storywright does for obtaining screenshots, which is actually desired behaviour as it makes the tool atomic and re-usable.

While the implementation is tightly coupled to our full source addon, it shouldn't coupled as a pre-requirement to have - thus having a graceful behaviour, if full source exists we process that code otherwise standard sb code.

  • naming of the CLI package, something like: StorybookLLMextractor feels appropriate

storybook composition:

this approach won't scale outside repo linked SB, thus the approach here should be that it's responsibility of linked(composed) SB to generate the markdown assets as part of their production builds

@dmytrokirpa
Copy link
Copy Markdown
Contributor Author

dmytrokirpa commented Jul 17, 2025

Thanks for the feedback @Hotell!

distribution:

i don't see how this could possible work as SB addon or bundler plugin because how this works under the hood.

it's very similar to what storywright does for obtaining screenshots, which is actually desired behaviour as it makes the tool atomic and re-usable.

While the implementation is tightly coupled to our full source addon, it shouldn't coupled as a pre-requirement to have - thus having a graceful behaviour, if full source exists we process that code otherwise standard sb code.

That makes sense.

  • naming of the CLI package, something like: StorybookLLMextractor feels appropriate

Agree, do you think it should live in the core monorepo or as a standalone repo?

storybook composition:

this approach won't scale outside repo linked SB, thus the approach here should be that it's responsibility of linked(composed) SB to generate the markdown assets as part of their production builds

That's exactly how it works atm, we use the refs cli arg to only include links to external (composed storybooks) in llms.txt, their assets generated as part of their production builds

@dmytrokirpa dmytrokirpa requested a review from Hotell July 28, 2025 14:53
@dmytrokirpa dmytrokirpa marked this pull request as ready for review July 28, 2025 15:58
@dmytrokirpa dmytrokirpa requested review from a team as code owners July 28, 2025 15:58
@Hotell
Copy link
Copy Markdown
Contributor

Hotell commented Jul 29, 2025

Agree, do you think it should live in the core monorepo or as a standalone repo?

lets stick in core repo for now for logistic and distribution simplicity, in future it might make sense to create a new fluent-storybook-addons repo or something alike

Copy link
Copy Markdown
Contributor

@Hotell Hotell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great !

  • added some commens/actionables ( mainly the SB api simplification / encapsulation )

A thing for thought:

  • with this approach it's a black box that might come as a surprise what the deployed output will be. maybe we should consider actually storing the .txt generation in git and force to re-generate if content changes ( similarly like we have for JSXIntrinsicElement in react-utilities )

@dmytrokirpa
Copy link
Copy Markdown
Contributor Author

  • with this approach it's a black box that might come as a surprise what the deployed output will be. maybe we should consider actually storing the .txt generation in git and force to re-generate if content changes ( similarly like we have for JSXIntrinsicElement in react-utilities )

That's a valid point about controlling the output, but it would mean core devs need to build a full docsite locally with every component story update PR, right?

@dmytrokirpa dmytrokirpa requested a review from Hotell July 29, 2025 15:38
@Hotell
Copy link
Copy Markdown
Contributor

Hotell commented Aug 6, 2025

  • with this approach it's a black box that might come as a surprise what the deployed output will be. maybe we should consider actually storing the .txt generation in git and force to re-generate if content changes ( similarly like we have for JSXIntrinsicElement in react-utilities )

That's a valid point about controlling the output, but it would mean core devs need to build a full docsite locally with every component story update PR, right?

exactly, but that is actually desirable - same approach to api.md and test snapshots, for review and to guarantee we don't ship unexpected outputs.

this is not a blocker as I mention - a thing t consider

Copy link
Copy Markdown
Contributor

@Hotell Hotell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some additional comments, but nothing blocking.

lets go !

@dmytrokirpa dmytrokirpa merged commit f5381f5 into microsoft:master Aug 6, 2025
16 of 17 checks passed
@dmytrokirpa dmytrokirpa deleted the feat/llm-docs branch August 6, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants