|
| 1 | +--- |
| 2 | +title: "Making Apache Camel Documentation Accessible to LLMs" |
| 3 | +date: 2025-11-12 |
| 4 | +draft: false |
| 5 | +authors: [ croway ] |
| 6 | +categories: [ "Tooling" ] |
| 7 | +preview: "How we implemented llms.txt to make Apache Camel documentation easily accessible to AI coding agents and LLM training" |
| 8 | +--- |
| 9 | + |
| 10 | +The Apache Camel website now generates markdown versions of all documentation pages following the [llms.txt specification](https://llmstxt.org/). This makes our documentation easily accessible to Large Language Models (LLMs) and AI coding assistants. |
| 11 | + |
| 12 | +## What is llms.txt? |
| 13 | + |
| 14 | +The llms.txt specification is a standardized format that helps LLMs discover and consume website content efficiently. Similar to how `robots.txt` guides web crawlers and `sitemap.xml` helps search engines, `llms.txt` provides a structured entry point for AI systems to understand and access documentation. |
| 15 | + |
| 16 | +The Apache Camel website now exposes [/llms.txt](/llms.txt) which provides: |
| 17 | + |
| 18 | +- An overview of Apache Camel and its documentation structure |
| 19 | +- Instructions for accessing markdown versions of pages |
| 20 | +- Information about key documentation sections (components, languages, patterns, user guides) |
| 21 | +- A count of available markdown pages (5,355+ pages) |
| 22 | +- Direct links to important documentation areas |
| 23 | + |
| 24 | +This standardized file makes it easy for LLM training pipelines and AI coding assistants to discover and consume Apache Camel documentation efficiently. |
| 25 | + |
| 26 | +## Why This Matters |
| 27 | + |
| 28 | +Making Apache Camel documentation LLM-friendly brings two significant benefits: |
| 29 | + |
| 30 | +### 1. Better AI Training Data |
| 31 | + |
| 32 | +When LLMs are trained or updated, they can easily crawl and index Apache Camel documentation through the standardized llms.txt format. This ensures that AI models have access to accurate, well-structured information about Apache Camel. |
| 33 | + |
| 34 | +### 2. Enhanced Developer Experience with AI Coding Agents |
| 35 | + |
| 36 | +Modern coding assistants like Claude Code, Cursor, GitHub Copilot, and Gemini CLI can now directly access Apache Camel documentation in a format optimized for their consumption. This means: |
| 37 | + |
| 38 | +- More accurate code suggestions for Apache Camel integrations |
| 39 | +- Better understanding of component configurations and patterns |
| 40 | +- Improved troubleshooting and debugging assistance |
| 41 | +- Up-to-date knowledge of the latest Camel features and versions |
| 42 | + |
| 43 | +## Using the Markdown Documentation |
| 44 | + |
| 45 | +### Example: AI Assistant Using llms.txt |
| 46 | + |
| 47 | +Here's a real example of an AI coding assistant using the llms.txt endpoint to answer Apache Camel questions: |
| 48 | + |
| 49 | +**User:** Reply to Apache Camel-related questions using the llms.txt `https://camel.apache.org/llms.txt` follow links and redirect. |
| 50 | +What is the latest Camel release? And the latest Camel Quarkus? |
| 51 | + |
| 52 | +**Assistant:** |
| 53 | +> I'll fetch the Apache Camel llms.txt file and follow any links to find the latest release information. |
| 54 | +> |
| 55 | +> Fetch(`https://camel.apache.org/llms.txt`) |
| 56 | +> |
| 57 | +> Let me follow the releases page link to get the specific version numbers: |
| 58 | +> |
| 59 | +> Fetch(`https://camel.apache.org/releases/index.md`) |
| 60 | +> |
| 61 | +> Fetch(`https://camel.apache.org/camel-quarkus/next/index.md`) |
| 62 | +> |
| 63 | +> Based on the Apache Camel documentation: |
| 64 | +> |
| 65 | +> **Latest Apache Camel Release:** 4.16.0 (released 2025-11-05) |
| 66 | +> |
| 67 | +> **Latest Camel Quarkus:** 3.27.x (compatible with Camel 4.14.2 and Quarkus 3.27.0) |
| 68 | +> |
| 69 | +> Note that the Camel Quarkus version (3.27.x) is currently tracking Camel 4.14.2, which is a few versions behind the latest Camel core release (4.16.0). This is typical as Camel Quarkus needs to maintain compatibility with both Apache Camel and Quarkus frameworks. |
| 70 | +
|
| 71 | +This demonstrates how AI assistants can autonomously navigate the llms.txt file, follow links to relevant documentation, and provide accurate, up-to-date information about Apache Camel releases. |
| 72 | + |
| 73 | +### Accessing Markdown Documentation |
| 74 | + |
| 75 | +For any HTML page on the Apache Camel website, a markdown version is available by appending `.md` to the URL: |
| 76 | + |
| 77 | +- **HTML**: `https://camel.apache.org/components/4.10.x/languages/simple-language.html` |
| 78 | +- **Markdown**: `https://camel.apache.org/components/4.10.x/languages/simple-language.html.md` |
| 79 | + |
| 80 | +## Try It Yourself |
| 81 | + |
| 82 | +Visit [/llms.txt](/llms.txt) to see the implementation in action. You can access markdown versions of any documentation page by appending `.md` to the URL. |
| 83 | + |
| 84 | +When using AI coding assistants like Claude Code, Cursor, or GitHub Copilot, they can now provide more accurate and up-to-date information about Apache Camel by accessing these markdown pages directly. |
| 85 | + |
| 86 | +## Implementation Details |
| 87 | + |
| 88 | +The implementation integrates into the existing build pipeline: |
| 89 | + |
| 90 | +### Build Pipeline Integration |
| 91 | + |
| 92 | +The markdown generation runs automatically during the website build process: |
| 93 | + |
| 94 | +1. **Antora** generates HTML documentation from AsciiDoc sources |
| 95 | +2. **Hugo** builds the website structure |
| 96 | +3. **Gulp task** converts HTML to markdown |
| 97 | +4. Generated markdown files are deployed alongside HTML pages |
| 98 | + |
| 99 | +### Content Extraction and Cleaning |
| 100 | + |
| 101 | +The conversion process focuses on extracting only the essential documentation content: |
| 102 | + |
| 103 | +- Removes navigation elements, headers, and footers |
| 104 | +- Extracts the main article content using semantic HTML selectors |
| 105 | +- Converts tables, code blocks, and formatting to GitHub Flavored Markdown |
| 106 | +- Preserves document structure and hierarchy |
| 107 | + |
| 108 | +## Results and Impact |
| 109 | + |
| 110 | +The implementation generates: |
| 111 | + |
| 112 | +- **5,355+ markdown pages** automatically during each build |
| 113 | +- **Coverage** of components, languages, data formats, and user guides |
| 114 | +- **Clean, structured content** optimized for LLM consumption |
| 115 | +- **No manual maintenance required** - fully automated with the build pipeline |
| 116 | + |
| 117 | +## Conclusion |
| 118 | + |
| 119 | +By implementing the llms.txt specification, we've made Apache Camel documentation more accessible to both AI training pipelines and developer-facing coding assistants. This automated solution requires no manual maintenance while providing structured, clean documentation in a format optimized for LLM consumption. |
0 commit comments