Skip to content

fix(llms): curated link list, llms-full.txt from source, fix install URL#471

Open
billlevine wants to merge 14 commits intomainfrom
ai-15-llms-txt-rework
Open

fix(llms): curated link list, llms-full.txt from source, fix install URL#471
billlevine wants to merge 14 commits intomainfrom
ai-15-llms-txt-rework

Conversation

@billlevine
Copy link
Copy Markdown
Contributor

@billlevine billlevine commented Apr 10, 2026

Supersedes #357 — built on Michael's branch with targeted fixes.

Thanks @stahnma for the original structure — kept the Python script approach,
CI integration, and the Key Terms / Quick Reference sections.

What changed

  • Curated link list: replaced the auto-generated 88-page sitemap with
    ~31 key pages (all tutorials, all language guides, core concepts, getting
    started). Less is more for llms.txt.
  • llms-full.txt: renamed from docs-content.txt (standard convention)
    and switched from fragile HTML regex extraction to concatenating source
    Markdown files directly — simpler and more reliable.
  • Install URL fix: added explicit correction in Quick Reference:
    flox.dev/download, not install.flox.dev (which does not exist).
    Addresses AI-13.
  • URL cleanup: stripped /index suffixes from generated links.
  • ## Optional section: K8s intro and Known Issues are marked optional
    per the llms.txt spec (tools can drop these under context pressure).
  • just instead of make: replaced gnumake/Makefile with
    just/justfile as the command runner. Same recipes: dev, build,
    clean.
  • CI cleanup: removed redundant explicit copies of llms.txt and
    llms-full.txt (already covered by cp -R ./site/*).

Why curated over auto-generated

The llms.txt spec intends a navigation index, not a sitemap dump.
Practitioners cite Firebase and MongoDB as cautionary tales of
over-generated files. The Stripe pattern (instructions first, curated
links) is what we're following here.

Closes #357

Co-authored-by: Michael Stahnke stahnma@gmail.com

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 10, 2026
stahnma and others added 4 commits April 10, 2026 11:32
This commit adds the generations of llms.txt (more robust than what was there)
as well a single docs-content.txt that can be fed to an agent or answer engine
to set context of how to work with Flox.

This does add a Makefile (sorry) because I needed something to run multiple
steps when building the documentaiton site.

The README has been updated to explain Makefile usage (and it's completely optional).

The tools directory is also new, with a small python file to generate the AI indexing files.

The robots.txt that helps the crawlers know where to work is maintained in the
flox/floxwebsite repo.
- Replace auto-generated 88-page sitemap with curated ~30 key links
- Add llms-full.txt generated from source Markdown (not HTML parsing)
- Rename docs-content.txt to llms-full.txt (standard convention)
- Fix URL generation to strip /index suffixes
- Explicitly correct install URL: flox.dev/download, not install.flox.dev

Co-authored-by: Michael Stahnke <stahnma@gmail.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Swap gnumake for just in the Flox environment manifest
- Replace Makefile with a justfile (same recipes: dev, build, clean)
- Update README to reference just commands

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@billlevine billlevine force-pushed the ai-15-llms-txt-rework branch from 4924e3e to c8c1572 Compare April 10, 2026 15:34
billlevine and others added 3 commits April 10, 2026 11:44
cp -R ./site/* already covers llms.txt and llms-full.txt

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
These files were accidentally swept in via git add -A. They belong
in a separate commit once lint issues are resolved.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
billlevine and others added 6 commits April 10, 2026 12:11
- Use `flox activate -c` in examples (preferred form per current docs)
- Keep `flox activate --` in critical rule (both are valid non-interactive)
- Fix `is_daemon` -> `is-daemon` (correct manifest TOML key)
- Fix typo: "the you are" -> "you are"

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds an Installation section that:
- Explicitly calls out that curl|bash and install.flox.dev do not exist
- Covers macOS (Homebrew), Debian/Ubuntu (.deb), RPM (.rpm)
- Points agents to flox.dev/download for package downloads

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Raw source markdown contains {{ FLOX_VERSION }} and {{ FLOX_PUBLIC_KEY }}
placeholders that MkDocs resolves during build. When concatenated
directly, these produce invalid URLs (e.g. download links truncated
at 'flox-') that fail link checking.

Read FLOX_VERSION from environment (set by flox activate hook) and
hardcode FLOX_PUBLIC_KEY (static value from mkdocs.yml) to produce
valid resolved content in llms-full.txt.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
These are example URLs in code blocks (manifest env var examples,
k8s config examples) — not real links. Add broad exclusions so they
are skipped regardless of which file lychee scans.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- auth.flox.dev/oauth/token: POST-only OAuth endpoint, GET returns 404
  by design; appears in curl example in organizations.md
- nix.dev/manual/nix/2.17/: versioned nix docs anchor referenced in
  manifest.toml docs; surfaced by llms-full.txt plain-text scanning

These URLs are in code block examples and not meant to be navigated.
Lychee skips them in HTML but treats them as links in plain text.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- cache.flox.dev root: Nix binary cache CDN, root returns 404 but
  the service is valid; appears in nix.conf code block examples
- downloads.flox.dev/by-env/stable/rpm: yum repo URL in
  installing-from-repo.md, not a web page; directory returns 404

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@billlevine
Copy link
Copy Markdown
Contributor Author

Note on approach: The lychee exclusions added in this PR are a side effect of generating llms-full.txt by concatenating raw source Markdown. Lychee normally skips content inside <code> blocks when scanning HTML, but treats everything in a plain-text file as a real link — so URLs that were always present in code block examples (CDN roots, yum repo URLs, OAuth endpoints, localhost examples) surfaced as failures for the first time.

The cleaner long-term fix would be to generate llms-full.txt from the built HTML using a proper HTML-to-text library rather than raw Markdown concatenation — you'd get template variables resolved, code block content naturally excluded, and cleaner output. Michael's original script took that approach; the regex-based extraction was the fragile part, not the idea.

That said, this PR is intentionally pragmatic — we're likely moving to Mintlify, which has built-in llms.txt support, so this whole pipeline is short-lived. Flagging for anyone who revisits this before that migration.


Via Forge (interactive) • a3060b22

@github-actions
Copy link
Copy Markdown

Both -c and -- are valid but behave differently:
- -c: subshell with full hooks/profile (preferred for agents)
- --: exec mode, no profile scripts

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbayer
Copy link
Copy Markdown
Contributor

jbayer commented Apr 10, 2026

This looks like a pragmatic win to me!

@jbayer jbayer self-requested a review April 10, 2026 17:12
@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@garbas garbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to use the plugin for that?

https://github.com/pawamoy/mkdocs-llmstxt

@billlevine
Copy link
Copy Markdown
Contributor Author

Wouldn't it be easier to use the plugin for that?

https://github.com/pawamoy/mkdocs-llmstxt

1 - yes, if I had known about it :)
2 - there is information that having some actual info in the llms.txt (contrary to the "spec") is helpful. More than just a sitemap. Comments are in the Linear ticket to this point.
3 - this is already done
4 - this will likely be dumped when we move to Mintlify in the near future (within a month is what's in my head, un-substantiated).

I could use the plugin, and either ignore the guidance of having the curated instructions, or post-process to include it, but I'm not sure it's that much simpler without tradeoffs of the existing.

@billlevine billlevine requested a review from garbas April 10, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants