-
Notifications
You must be signed in to change notification settings - Fork 26
Description
llms-small.txt feedback and rendering pipeline issue
What we noticed
While working on the structured llms.txt index (PR #379), we dug into the llms-small.txt and llms-full.txt feeds. Right now on the live site they're effectively identical (2,555,401 vs 2,555,535 bytes, ~637K tokens each). That's too large for most AI coding tools:
| Tool/Model | Context Window |
|---|---|
| Cursor default (Claude 4.x Sonnet) | 200K |
| GPT-4o / GPT-5 | 128-200K |
| Claude 4.x Opus/Sonnet | 200K |
| Gemini 2.5/3 Pro | 1M |
With agents also needing context for the user's code and conversation, something around 50 to 80K tokens for the small feed would be a good target.
We see the curated IDs and HTML-to-markdown pipeline work in main that's aiming to fix this. When we tested a local build though, the pipeline runs into trouble on 539 pages:
Error serving markdown for [page]: Error: Objects are not valid as a React child
(found: object with keys {astro:jsx, type, props}).
The experimental_AstroContainer in src/lib/llms.ts registers only the React renderer (@astrojs/react/server.js), but most doc pages use Astro-native components (Starlight's CardGrid, Tabs, Aside, etc.) which produce Astro JSX nodes that React can't render. The errors are caught (not thrown), so the build exits 0 but llms-small.txt and llms-full.txt don't actually get generated. The deploy falls back to the plugin's original rawContent: true output, which is why the live feeds are still identical.
The llms.txt index route still works fine since it doesn't render page content (serving the structured index from PR #379).
We ran into a similar problem earlier when testing rawContent: false with the plugin directly, but that was specifically on the 12 pages using GraphQLEditor. The current pipeline hits the broader version of the same issue across all pages with Astro-native components.
Token breakdown (might help with tuning the curated set)
We measured the full feed before the pipeline work. Actual rendered sizes will differ, but the relative sizing between pages should hold:
| Category | ~Tokens | Pages | Notes |
|---|---|---|---|
| Move Book (language reference) | 54K | 74 | Essential for Move writers, should stay in full |
| Top 6 reference pages | 80K | 6 | Indexer tables, error codes, specs |
| MDX noise (imports + wrapper tags) | 14K | all | Eliminated once the pipeline works |
| Legacy/Deprecated pages | 11K | 7 | Explicitly marked outdated |
| Everything else (~350 pages) | ~478K |
Full page-by-token breakdown (441 pages, sorted largest first)
~19K Indexer Table Reference
~14K Aptos Error Codes
~13K Move Security Guidelines
~13K Your First Transaction
~11K Exchange Integration Guide
~10K Specifications
~9K Aptos Blockchain Deep Dive
~8K Functions (Move Book)
~8K 1. Create a Smart Contract
~8K Confidential Asset (CA)
~7K Account Key Rotation
~7K Aptos Fungible Asset (FA) Standard
~7K Aptos Glossary
~7K Application Integration Guide
~6K Account Abstraction
~6K Your First Aptos Multisig
~6K Aptos Digital Asset Standard
~6K Local Variables and Scope (Move Book)
~6K Cryptography
~6K Choose a UI Package
~6K Use Hardware Ledger via CLI
~5K Aptos Token Standard (Legacy)
~5K Binary Canonical Serialization (BCS)
~5K Your First Move Module
~5K Computing Transaction Gas
~5K Ethereum to Aptos Migration Guide
~5K Delegation Pool Operations
~5K Staking
~4K Structs and Resources (Move Book)
~4K Gas and Storage Fees
~4K Your First NFT
~4K X-Chain Accounts
~4K Vector (Move Book)
~4K Generics (Move Book)
~4K Randomness API
~4K Confidential Asset (CA)
~4K Staking Pool Operations
~4K Migrate to Indexer SDK
~4K Maps (Move Book)
~3K Aptos Move Lint
~3K Creating objects
~3K Your First Coin
~3K TypeScript SDK Quickstart
~3K Global Storage - Operators (Move Book)
~3K Expressions (Move Book)
~3K Transaction Filtering
~3K Connect to a Network
~3K State Synchronization
~3K 5. Handle Tasks
~3K Local Simulation, Benchmarking & Gas Profiling
~3K Abilities (Move Book)
~3K Keyless Integration Guide
~3K Why Move?
~3K Accounts
~3K Your First Fungible Asset
~3K Go SDK - Building Transactions
~3K Transactions and States
... (remaining ~390 pages are 1-3K tokens each)
Some ideas
Curated set additions
The LLMS_SMALL_DOC_IDS has 26 pages across 7 sections (Start Here, Smart Contracts, APIs And Data, SDKs, Advanced Topics, AI Tooling, Nodes And Operations). A few that might be worth adding:
network/blockchain/accounts(recently updated for AIP-115 stateless accounts, foundational for any integration)network/blockchain/txns-states(transaction lifecycle and state model, constantly referenced)build/guides/exchanges(most complete integration guide on the site right now)build/get-started/ethereum-cheatsheet(updated Nov 2025, the most common migration path)
Landing page as curation source
The aptos.dev landing page has a hand-picked set of links (keyless, sponsored transactions, orderless transactions, objects, Move Book, faucet, etc.) that represent the team's view of the most important entry points. LLM agents typically skip the landing page entirely, so surfacing those same links in the curated feeds would help agents find the highest-value pages first.
Skills callouts throughout the LLM feeds
Agent Skills should be mentioned at the top of every LLM feed (llms.txt, llms-small.txt, llms-full.txt) so that any agent ingesting the feed immediately knows they exist. Then on individual pages where a specific skill applies, mention it again so the agent can use it in context.
For example, the llms.txt top section already has a line for Agent Skills. The same callout should appear at the top of llms-small.txt and llms-full.txt. Then within the smart contracts section, mention write-contracts, generate-tests, and security-audit. Within the SDK section, mention ts-sdk-transactions, etc.
Pages with corresponding skills:
| Doc page | Skill |
|---|---|
| Smart Contracts | write-contracts, generate-tests, security-audit |
| Move deployment | deploy-contracts |
| TypeScript SDK | use-ts-sdk, ts-sdk-transactions, ts-sdk-client, etc. |
| Project setup | create-aptos-project |
| Gas optimization | analyze-gas-optimization |
| Move V1 to V2 | modernize-move |
How we measured
Token estimates use chars/4 approximation, measured 2026-03-18 against the live feed. Relative sizing between pages should hold even after the rendering pipeline is working.
— Tippi and Claude Fifestarr