llms-small.txt is still identical to llms-full.txt (~637K tokens each)

# llms-small.txt feedback and rendering pipeline issue

## What we noticed

While working on the structured llms.txt index (PR #379), we dug into the llms-small.txt and llms-full.txt feeds. Right now on the live site they're effectively identical (2,555,401 vs 2,555,535 bytes, ~637K tokens each). That's too large for most AI coding tools:

| Tool/Model | Context Window |
|------------|---------------|
| Cursor default (Claude 4.x Sonnet) | 200K |
| GPT-4o / GPT-5 | 128-200K |
| Claude 4.x Opus/Sonnet | 200K |
| Gemini 2.5/3 Pro | 1M |

With agents also needing context for the user's code and conversation, something around 50 to 80K tokens for the small feed would be a good target.

We see the curated IDs and HTML-to-markdown pipeline work in main that's aiming to fix this. When we tested a local build though, the pipeline runs into trouble on 539 pages:

```
Error serving markdown for [page]: Error: Objects are not valid as a React child
(found: object with keys {astro:jsx, type, props}).
```

The `experimental_AstroContainer` in `src/lib/llms.ts` registers only the React renderer (`@astrojs/react/server.js`), but most doc pages use Astro-native components (Starlight's CardGrid, Tabs, Aside, etc.) which produce Astro JSX nodes that React can't render. The errors are caught (not thrown), so the build exits 0 but `llms-small.txt` and `llms-full.txt` don't actually get generated. The deploy falls back to the plugin's original `rawContent: true` output, which is why the live feeds are still identical.

The llms.txt index route still works fine since it doesn't render page content (serving the structured index from PR #379).

We ran into a similar problem earlier when testing `rawContent: false` with the plugin directly, but that was specifically on the 12 pages using `GraphQLEditor`. The current pipeline hits the broader version of the same issue across all pages with Astro-native components.

## Token breakdown (might help with tuning the curated set)

We measured the full feed before the pipeline work. Actual rendered sizes will differ, but the relative sizing between pages should hold:

| Category | ~Tokens | Pages | Notes |
|----------|---------|-------|-------|
| Move Book (language reference) | 54K | 74 | Essential for Move writers, should stay in full |
| Top 6 reference pages | 80K | 6 | Indexer tables, error codes, specs |
| MDX noise (imports + wrapper tags) | 14K | all | Eliminated once the pipeline works |
| Legacy/Deprecated pages | 11K | 7 | Explicitly marked outdated |
| Everything else (~350 pages) | ~478K | | |

<details>
<summary>Full page-by-token breakdown (441 pages, sorted largest first)</summary>

```
~19K  Indexer Table Reference
~14K  Aptos Error Codes
~13K  Move Security Guidelines
~13K  Your First Transaction
~11K  Exchange Integration Guide
~10K  Specifications
 ~9K  Aptos Blockchain Deep Dive
 ~8K  Functions (Move Book)
 ~8K  1. Create a Smart Contract
 ~8K  Confidential Asset (CA)
 ~7K  Account Key Rotation
 ~7K  Aptos Fungible Asset (FA) Standard
 ~7K  Aptos Glossary
 ~7K  Application Integration Guide
 ~6K  Account Abstraction
 ~6K  Your First Aptos Multisig
 ~6K  Aptos Digital Asset Standard
 ~6K  Local Variables and Scope (Move Book)
 ~6K  Cryptography
 ~6K  Choose a UI Package
 ~6K  Use Hardware Ledger via CLI
 ~5K  Aptos Token Standard (Legacy)
 ~5K  Binary Canonical Serialization (BCS)
 ~5K  Your First Move Module
 ~5K  Computing Transaction Gas
 ~5K  Ethereum to Aptos Migration Guide
 ~5K  Delegation Pool Operations
 ~5K  Staking
 ~4K  Structs and Resources (Move Book)
 ~4K  Gas and Storage Fees
 ~4K  Your First NFT
 ~4K  X-Chain Accounts
 ~4K  Vector (Move Book)
 ~4K  Generics (Move Book)
 ~4K  Randomness API
 ~4K  Confidential Asset (CA)
 ~4K  Staking Pool Operations
 ~4K  Migrate to Indexer SDK
 ~4K  Maps (Move Book)
 ~3K  Aptos Move Lint
 ~3K  Creating objects
 ~3K  Your First Coin
 ~3K  TypeScript SDK Quickstart
 ~3K  Global Storage - Operators (Move Book)
 ~3K  Expressions (Move Book)
 ~3K  Transaction Filtering
 ~3K  Connect to a Network
 ~3K  State Synchronization
 ~3K  5. Handle Tasks
 ~3K  Local Simulation, Benchmarking & Gas Profiling
 ~3K  Abilities (Move Book)
 ~3K  Keyless Integration Guide
 ~3K  Why Move?
 ~3K  Accounts
 ~3K  Your First Fungible Asset
 ~3K  Go SDK - Building Transactions
 ~3K  Transactions and States
      ... (remaining ~390 pages are 1-3K tokens each)
```

</details>

## Some ideas

### Curated set additions

The `LLMS_SMALL_DOC_IDS` has 26 pages across 7 sections (Start Here, Smart Contracts, APIs And Data, SDKs, Advanced Topics, AI Tooling, Nodes And Operations). A few that might be worth adding:

- `network/blockchain/accounts` (recently updated for AIP-115 stateless accounts, foundational for any integration)
- `network/blockchain/txns-states` (transaction lifecycle and state model, constantly referenced)
- `build/guides/exchanges` (most complete integration guide on the site right now)
- `build/get-started/ethereum-cheatsheet` (updated Nov 2025, the most common migration path)

### Landing page as curation source

The aptos.dev landing page has a hand-picked set of links (keyless, sponsored transactions, orderless transactions, objects, Move Book, faucet, etc.) that represent the team's view of the most important entry points. LLM agents typically skip the landing page entirely, so surfacing those same links in the curated feeds would help agents find the highest-value pages first.

### Skills callouts throughout the LLM feeds

[Agent Skills](https://github.com/aptos-labs/aptos-agent-skills) should be mentioned at the top of every LLM feed (llms.txt, llms-small.txt, llms-full.txt) so that any agent ingesting the feed immediately knows they exist. Then on individual pages where a specific skill applies, mention it again so the agent can use it in context.

For example, the llms.txt top section already has a line for Agent Skills. The same callout should appear at the top of llms-small.txt and llms-full.txt. Then within the smart contracts section, mention `write-contracts`, `generate-tests`, and `security-audit`. Within the SDK section, mention `ts-sdk-transactions`, etc.

Pages with corresponding skills:

| Doc page | Skill |
|----------|-------|
| Smart Contracts | `write-contracts`, `generate-tests`, `security-audit` |
| Move deployment | `deploy-contracts` |
| TypeScript SDK | `use-ts-sdk`, `ts-sdk-transactions`, `ts-sdk-client`, etc. |
| Project setup | `create-aptos-project` |
| Gas optimization | `analyze-gas-optimization` |
| Move V1 to V2 | `modernize-move` |

## How we measured

Token estimates use chars/4 approximation, measured 2026-03-18 against the live feed. Relative sizing between pages should hold even after the rendering pipeline is working.

— Tippi and Claude Fifestarr


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llms-small.txt is still identical to llms-full.txt (~637K tokens each) #406

llms-small.txt feedback and rendering pipeline issue

What we noticed

Token breakdown (might help with tuning the curated set)

Some ideas

Curated set additions

Landing page as curation source

Skills callouts throughout the LLM feeds

How we measured

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool/Model	Context Window
Cursor default (Claude 4.x Sonnet)	200K
GPT-4o / GPT-5	128-200K
Claude 4.x Opus/Sonnet	200K
Gemini 2.5/3 Pro	1M

Category	~Tokens	Pages	Notes
Move Book (language reference)	54K	74	Essential for Move writers, should stay in full
Top 6 reference pages	80K	6	Indexer tables, error codes, specs
MDX noise (imports + wrapper tags)	14K	all	Eliminated once the pipeline works
Legacy/Deprecated pages	11K	7	Explicitly marked outdated
Everything else (~350 pages)	~478K

Doc page	Skill
Smart Contracts	`write-contracts`, `generate-tests`, `security-audit`
Move deployment	`deploy-contracts`
TypeScript SDK	`use-ts-sdk`, `ts-sdk-transactions`, `ts-sdk-client`, etc.
Project setup	`create-aptos-project`
Gas optimization	`analyze-gas-optimization`
Move V1 to V2	`modernize-move`

llms-small.txt is still identical to llms-full.txt (~637K tokens each) #406

Description

llms-small.txt feedback and rendering pipeline issue

What we noticed

Token breakdown (might help with tuning the curated set)

Some ideas

Curated set additions

Landing page as curation source

Skills callouts throughout the LLM feeds

How we measured

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions