Skip to content

Conversation

paoloricciuti
Copy link
Member

@paoloricciuti paoloricciuti commented Sep 26, 2025

This adds an endpoint to list all the available sections of documentation. We are gonna use this in the MCP to fetch the various sections to feed the LLM.

I've also included the full documentation, including the content with the ?complete query param. We could use this with the stdio MCP to load all the data at startup and store that somewhere so that it is accessible offline.
Duh, I forgot we can't access query params in prerendering...I've removed this for now, we can always fetch doc by doc in the CLI, and if we decide, we can create another endpoint for the whole docs.

And apparently @khromov was also working on this lol #1557

I'm a bit torn which one is better...the other adds the use_case metadata but is also very specific for LLM. This is a more general endpoint that could be used for something else, too. 🤷🏼

EDIT: i've also added the use_cases metadata since I think it could be a very good addition for LLMs.txt anyway. We don't have to compile all the use cases for every section but it's nice to have the ability to do so. I've also renamed to .json so that it's better to visit this in the browser.

Let's see which one feels better.

Before submitting the PR, please make sure you do the following

  • It's really useful if your PR references an issue where it is discussed ahead of time.
  • Prefix your PR title with feat:, fix:, chore:, or docs:.
  • This message body should clearly illustrate what problems it solves.

Copy link

vercel bot commented Sep 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
svelte-dev Ready Ready Preview Oct 1, 2025 8:14pm

@paoloricciuti paoloricciuti changed the title feat: add docs/sections endpoint to list all docs sections feat: add docs/sections.json endpoint to list all docs sections Sep 28, 2025
@dominikg
Copy link
Member

Let's see which one feels better.

Is there any standard for this kind of document? Reads like a "sitemap light". What other tools do you have in mind that would benefit from this?

Also what kind of traffic does this create for the json file itself and the docs sections afterwards?

@paoloricciuti
Copy link
Member Author

Is there any standard for this kind of document? Reads like a "sitemap light".

Not that I know of...it's just part of our needs to be able to serve them in the MCP, which again is, at the end of the day, just a normal program.

What other tools do you have in mind that would benefit from this?

Every app that wants to link to all the documentation sections of the docs. We technically have something similar already, which search uses (and that I'm using in the Svelte Raycast extension and in Sveltelab) https://svelte.dev/content.json ...now that I think about it, we could just include the rest of the relevant info on that page, but that would lead to a loooot of wasted data going over the wire for no reason.

Also, what kind of traffic does this create for the JSON file itself and the docs sections afterwards?

This doc would be fetched when the MCP server starts up and cached as long as the server is alive...being hosted on serverless, however, it would probably be accessed decently frequently (if we move off of serverless, we would still need to set a TTL to fetch fresh data every once in a while. As per the single docs sections, it really depends on when the user or the LLM decides to include them in the context. But tbf it shouldn't matter that much since both of those resources are prerendered and served from a CDN with a long TTL.

@khromov
Copy link
Contributor

khromov commented Sep 29, 2025

This doc would be fetched when the MCP server starts up and cached as long as the server is alive..

We should actually have some sort of invalidation mechanism here, because if people are running the MCP on a long-running server they won't ever get docs updates. (I usually use the stale-while-revalidate package but there are thousands of similar ones.)

I really don't like the idea of forcing everything into one generic endpoint likecontent.json, aside from wasted bytes it also makes it harder to change that endpoint because several clients rely on the specific format. From my pov we should call this new endpoint /mcp.json so that we can put whatever MCP stuff we see fit into it.

@paoloricciuti
Copy link
Member Author

We should actually have some sort of invalidation mechanism here, because if people are running the MCP on a long-running server they won't ever get docs updates. (I usually use the stale-while-revalidate package but there are thousands of similar ones.)

Yeah as I've specified later, we should probably specify a TTL in case we move to a long-running server or locally.

I really don't like the idea of forcing everything into one generic endpoint like content.json, aside from wasted bytes it also makes it harder to change that endpoint because several clients rely on the specific format. From my pov we should call this new endpoint /mcp.json so that we can put whatever MCP stuff we see fit into it.

I kinda agree on content.json...it's already quite a bit chaotic, adding other stuff would make both the search and the MCP slower for no real reason. I'm not sure if we should focus this endpoint specifically for the MCP, though...now it's its main use, but this might change in the future as the information is not specifically tied to the MCP.

@khromov
Copy link
Contributor

khromov commented Sep 29, 2025

now it's its main use, but this might change in the future as the information is not specifically tied to the MCP.

What would be the future use case? Isn't this premature optimization? 😉

@paoloricciuti
Copy link
Member Author

What would be the future use case? Isn't this premature optimization? 😉

Well, yes and no...does it really need to be called mcp.json for it to work? Calling it /docs/sections.json is more true to what it really is and leaves the door open in case tomorrow we want to use this on the Sveltesociety website without it sounding weird, for example. 😄

@Rich-Harris
Copy link
Member

How useful is this, really? It's effectively just a list of titles. If I'm an MCP then sometimes those titles will be enough to know which documents are relevant to my current task, but it feels like it's bound to be pretty hit-or-miss.

What if the documentation lived in the package? Then the MCP server would have direct access to it, since it has a dependency on the package.

@paoloricciuti
Copy link
Member Author

The list serves the purpose of giving an initial hint to the LLM but most importantly to the user: it's also used for the user to add resources through the MCP. Which means that it can manually add the docs it needs with a possibly higher degree of confidence.

Other than that it's just a way to have a list the LLM can pick from. Wdym with "lived in the package"?

@khromov
Copy link
Contributor

khromov commented Sep 29, 2025

@Rich-Harris one of the ideas is to surface condensed documentation "use cases" for each docs file. This would give hints to LLMs as to when specific documentation files are useful to fetch without having to fetch the entire documentation file and eating context window for no use.

You can see an early PoC of this in https://github.com/sveltejs/svelte/pull/16867/files that uses the use_cases frontmatter key that this PR also surfaces in the endpoint.

@khromov
Copy link
Contributor

khromov commented Sep 29, 2025

This is also early days for MCPs and LLMs so trying to determine a "valid" rationale for some specific feature is moot in my view. We're all here trying to make the Svelte better and we will need to experiment to find the best way forward.

@Rich-Harris
Copy link
Member

Wdym with "lived in the package"?

I mean that the documentation directory is moved inside packages/svelte and added to the "files" array in package.json, so that if the MCP server needs to know which documents exist, and what their titles are, it can just poke inside node_modules/svelte/documentation. Not saying this is totally ideal, it has some obvious downsides, but am trying to think this through

trying to determine a "valid" rationale for some specific feature is moot in my view

I totally agree, but by the same token (hehe) we should be careful about not adding a bunch of stuff that turns out to not be useful, but which still creates a maintenance burden. Like, I don't want us to be on the hook for maintaining a bunch of use_cases metadata unless we know that it is in fact useful.

I think https://svelte.dev/docs/llms is a great example of this. Are those documents useful? Has llms.txt been sufficiently widely adopted and is it implemented in a sufficiently consistent way? I'm not super close to this stuff but my impression is that it hasn't really panned out. Maybe we would have been better off doing something like what Bun is doing. To my mind the way to square this circle is to keep the experimental stuff local to the experiment.

All of which of course is also a reason not to put documentation in the svelte package directly. But the MCP server could for example retrieve the documentation directly from GitHub.

@paoloricciuti
Copy link
Member Author

Both fetching from node_modules and from GitHub could work but tbf it feels way worse than adding an endpoint on svelte.dev.

lllms.txt is indeed useful for a lot of people and it will actually be used for this very purpose. And as I've said I specifically created the endpoint this way because it's simple and generic, meaning it could be useful for something else too. It's really not that different from content.json used for search (actually being so simple it's even better because there's close to 0 maintenance burden).

What bun is doing is good and we should probably do that too...but searching the web is still too chaotic for LLMs and having an organized list coming from the MCP is much much better for sure.

if we really really don't want to include this endpoint we should at least setup a GitHub webhook and store the new docs in a svelte MCP db as both adding the files to the packages and fetching from GitHub have, imho, way worse tradeoffs

@dominikg
Copy link
Member

e18e would like to have a word if we added the documentation to the svelte npm package itself. If it needs to be on npm i'd rather release a buddy package svelte-docs that comes with the same version as svelte always.

But i like the thought of including the docs with the mcp cli outright. Can this be made to work offline then with local models and a a local mcp installation?

@paoloricciuti
Copy link
Member Author

But i like the thought of including the docs with the mcp cli outright. Can this be made to work offline then with local models and a a local mcp installation?

The idea is to allow the user to run a command if they want to download the last version of the docs...once you do that you can work on the local version totally offline (with local models). But the default would be to fetch so you can get the very latest docs.

@dominikg
Copy link
Member

How would you match docs version to svelte version, if the users project is behind and the mcp uses latest docs, it might suggest a feature that doesn't even work in the users app. Or is it smart enough to evaluate the "since x.y.z" comments in docs?

If it always needs to download a version of the docs first thing, i'd argue it makes even more sense to bundle them or make them a dependency "svelte-docs":"^5.0.0"

@paoloricciuti
Copy link
Member Author

How would you match docs version to svelte version, if the users project is behind and the mcp uses latest docs, it might suggest a feature that doesn't even work in the users app. Or is it smart enough to evaluate the "since x.y.z" comments in docs?

It should be smart enough but we can even hint at it in the responses so saying that it should check the installed version before adding code specific to a certain version. But after all it doesn't matter too much, it the user see a feature that doesn't fit his version it's gonna tell the LLM itself.

If it always needs to download a version of the docs first thing, i'd argue it makes even more sense to bundle them or make them a dependency "svelte-docs":"^5.0.0"

It downloads on demand unless you specifically want to download latest.

@Rich-Harris
Copy link
Member

One wrinkle with publishing the docs separately is that it would be harder to evolve the docs without publishing a new version of the library — if you you fix a typo in the docs, do you then need to publish a patch version of both svelte and svelte-docs, or just svelte-docs in which case the versions no longer line up? The same is true if you include the documentation in the package, of course

@paoloricciuti
Copy link
Member Author

Agreed...honestly, there's no reason to complicate things: this endpoint is a very simple endpoint, not hard to maintain, that covers what the MCP needs, and if we ever decide it's not really useful, we can always remove it...it's not subject to semver.

All the rest of the solutions will make the process slower, more bug-prone, and more annoying to code.

@Rich-Harris
Copy link
Member

if we ever decide it's not really useful, we can always remove it...it's not subject to semver

At a bare minimum the URL should reflect that — /experimental/* or similar. Otherwise people will find ways to depend on it, and get mad when we break them

@paoloricciuti
Copy link
Member Author

I'm good with that...pushing the change rn 🤟🏻

@paoloricciuti paoloricciuti changed the title feat: add docs/sections.json endpoint to list all docs sections feat: add docs/experimental/sections.json endpoint to list all docs sections Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants