Skip to content

llms-small.txt is still identical to llms-full.txt (~637K tokens each) #406

@tippi-fifestarr

Description

@tippi-fifestarr

llms-small.txt feedback and rendering pipeline issue

What we noticed

While working on the structured llms.txt index (PR #379), we dug into the llms-small.txt and llms-full.txt feeds. Right now on the live site they're effectively identical (2,555,401 vs 2,555,535 bytes, ~637K tokens each). That's too large for most AI coding tools:

Tool/Model Context Window
Cursor default (Claude 4.x Sonnet) 200K
GPT-4o / GPT-5 128-200K
Claude 4.x Opus/Sonnet 200K
Gemini 2.5/3 Pro 1M

With agents also needing context for the user's code and conversation, something around 50 to 80K tokens for the small feed would be a good target.

We see the curated IDs and HTML-to-markdown pipeline work in main that's aiming to fix this. When we tested a local build though, the pipeline runs into trouble on 539 pages:

Error serving markdown for [page]: Error: Objects are not valid as a React child
(found: object with keys {astro:jsx, type, props}).

The experimental_AstroContainer in src/lib/llms.ts registers only the React renderer (@astrojs/react/server.js), but most doc pages use Astro-native components (Starlight's CardGrid, Tabs, Aside, etc.) which produce Astro JSX nodes that React can't render. The errors are caught (not thrown), so the build exits 0 but llms-small.txt and llms-full.txt don't actually get generated. The deploy falls back to the plugin's original rawContent: true output, which is why the live feeds are still identical.

The llms.txt index route still works fine since it doesn't render page content (serving the structured index from PR #379).

We ran into a similar problem earlier when testing rawContent: false with the plugin directly, but that was specifically on the 12 pages using GraphQLEditor. The current pipeline hits the broader version of the same issue across all pages with Astro-native components.

Token breakdown (might help with tuning the curated set)

We measured the full feed before the pipeline work. Actual rendered sizes will differ, but the relative sizing between pages should hold:

Category ~Tokens Pages Notes
Move Book (language reference) 54K 74 Essential for Move writers, should stay in full
Top 6 reference pages 80K 6 Indexer tables, error codes, specs
MDX noise (imports + wrapper tags) 14K all Eliminated once the pipeline works
Legacy/Deprecated pages 11K 7 Explicitly marked outdated
Everything else (~350 pages) ~478K
Full page-by-token breakdown (441 pages, sorted largest first)
~19K  Indexer Table Reference
~14K  Aptos Error Codes
~13K  Move Security Guidelines
~13K  Your First Transaction
~11K  Exchange Integration Guide
~10K  Specifications
 ~9K  Aptos Blockchain Deep Dive
 ~8K  Functions (Move Book)
 ~8K  1. Create a Smart Contract
 ~8K  Confidential Asset (CA)
 ~7K  Account Key Rotation
 ~7K  Aptos Fungible Asset (FA) Standard
 ~7K  Aptos Glossary
 ~7K  Application Integration Guide
 ~6K  Account Abstraction
 ~6K  Your First Aptos Multisig
 ~6K  Aptos Digital Asset Standard
 ~6K  Local Variables and Scope (Move Book)
 ~6K  Cryptography
 ~6K  Choose a UI Package
 ~6K  Use Hardware Ledger via CLI
 ~5K  Aptos Token Standard (Legacy)
 ~5K  Binary Canonical Serialization (BCS)
 ~5K  Your First Move Module
 ~5K  Computing Transaction Gas
 ~5K  Ethereum to Aptos Migration Guide
 ~5K  Delegation Pool Operations
 ~5K  Staking
 ~4K  Structs and Resources (Move Book)
 ~4K  Gas and Storage Fees
 ~4K  Your First NFT
 ~4K  X-Chain Accounts
 ~4K  Vector (Move Book)
 ~4K  Generics (Move Book)
 ~4K  Randomness API
 ~4K  Confidential Asset (CA)
 ~4K  Staking Pool Operations
 ~4K  Migrate to Indexer SDK
 ~4K  Maps (Move Book)
 ~3K  Aptos Move Lint
 ~3K  Creating objects
 ~3K  Your First Coin
 ~3K  TypeScript SDK Quickstart
 ~3K  Global Storage - Operators (Move Book)
 ~3K  Expressions (Move Book)
 ~3K  Transaction Filtering
 ~3K  Connect to a Network
 ~3K  State Synchronization
 ~3K  5. Handle Tasks
 ~3K  Local Simulation, Benchmarking & Gas Profiling
 ~3K  Abilities (Move Book)
 ~3K  Keyless Integration Guide
 ~3K  Why Move?
 ~3K  Accounts
 ~3K  Your First Fungible Asset
 ~3K  Go SDK - Building Transactions
 ~3K  Transactions and States
      ... (remaining ~390 pages are 1-3K tokens each)

Some ideas

Curated set additions

The LLMS_SMALL_DOC_IDS has 26 pages across 7 sections (Start Here, Smart Contracts, APIs And Data, SDKs, Advanced Topics, AI Tooling, Nodes And Operations). A few that might be worth adding:

  • network/blockchain/accounts (recently updated for AIP-115 stateless accounts, foundational for any integration)
  • network/blockchain/txns-states (transaction lifecycle and state model, constantly referenced)
  • build/guides/exchanges (most complete integration guide on the site right now)
  • build/get-started/ethereum-cheatsheet (updated Nov 2025, the most common migration path)

Landing page as curation source

The aptos.dev landing page has a hand-picked set of links (keyless, sponsored transactions, orderless transactions, objects, Move Book, faucet, etc.) that represent the team's view of the most important entry points. LLM agents typically skip the landing page entirely, so surfacing those same links in the curated feeds would help agents find the highest-value pages first.

Skills callouts throughout the LLM feeds

Agent Skills should be mentioned at the top of every LLM feed (llms.txt, llms-small.txt, llms-full.txt) so that any agent ingesting the feed immediately knows they exist. Then on individual pages where a specific skill applies, mention it again so the agent can use it in context.

For example, the llms.txt top section already has a line for Agent Skills. The same callout should appear at the top of llms-small.txt and llms-full.txt. Then within the smart contracts section, mention write-contracts, generate-tests, and security-audit. Within the SDK section, mention ts-sdk-transactions, etc.

Pages with corresponding skills:

Doc page Skill
Smart Contracts write-contracts, generate-tests, security-audit
Move deployment deploy-contracts
TypeScript SDK use-ts-sdk, ts-sdk-transactions, ts-sdk-client, etc.
Project setup create-aptos-project
Gas optimization analyze-gas-optimization
Move V1 to V2 modernize-move

How we measured

Token estimates use chars/4 approximation, measured 2026-03-18 against the live feed. Relative sizing between pages should hold even after the rendering pipeline is working.

— Tippi and Claude Fifestarr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions