Skip to content

Commit e781c10

Browse files
committed
Making Apache Camel Documentation Accessible to LLMs blogpost
1 parent e080a1a commit e781c10

File tree

2 files changed

+119
-0
lines changed

2 files changed

+119
-0
lines changed
238 KB
Loading
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: "Making Apache Camel Documentation Accessible to LLMs"
3+
date: 2025-11-12
4+
draft: false
5+
authors: [ croway ]
6+
categories: [ "Tooling" ]
7+
preview: "How we implemented llms.txt to make Apache Camel documentation easily accessible to AI coding agents and LLM training"
8+
---
9+
10+
The Apache Camel website now generates markdown versions of all documentation pages following the [llms.txt specification](https://llmstxt.org/). This makes our documentation easily accessible to Large Language Models (LLMs) and AI coding assistants.
11+
12+
## What is llms.txt?
13+
14+
The llms.txt specification is a standardized format that helps LLMs discover and consume website content efficiently. Similar to how `robots.txt` guides web crawlers and `sitemap.xml` helps search engines, `llms.txt` provides a structured entry point for AI systems to understand and access documentation.
15+
16+
The Apache Camel website now exposes [/llms.txt](/llms.txt) which provides:
17+
18+
- An overview of Apache Camel and its documentation structure
19+
- Instructions for accessing markdown versions of pages
20+
- Information about key documentation sections (components, languages, patterns, user guides)
21+
- A count of available markdown pages (5,355+ pages)
22+
- Direct links to important documentation areas
23+
24+
This standardized file makes it easy for LLM training pipelines and AI coding assistants to discover and consume Apache Camel documentation efficiently.
25+
26+
## Why This Matters
27+
28+
Making Apache Camel documentation LLM-friendly brings two significant benefits:
29+
30+
### 1. Better AI Training Data
31+
32+
When LLMs are trained or updated, they can easily crawl and index Apache Camel documentation through the standardized llms.txt format. This ensures that AI models have access to accurate, well-structured information about Apache Camel.
33+
34+
### 2. Enhanced Developer Experience with AI Coding Agents
35+
36+
Modern coding assistants like Claude Code, Cursor, GitHub Copilot, and Gemini CLI can now directly access Apache Camel documentation in a format optimized for their consumption. This means:
37+
38+
- More accurate code suggestions for Apache Camel integrations
39+
- Better understanding of component configurations and patterns
40+
- Improved troubleshooting and debugging assistance
41+
- Up-to-date knowledge of the latest Camel features and versions
42+
43+
## Using the Markdown Documentation
44+
45+
### Example: AI Assistant Using llms.txt
46+
47+
Here's a real example of an AI coding assistant using the llms.txt endpoint to answer Apache Camel questions:
48+
49+
**User:** Reply to Apache Camel-related questions using the llms.txt `https://camel.apache.org/llms.txt` follow links and redirect.
50+
What is the latest Camel release? And the latest Camel Quarkus?
51+
52+
**Assistant:**
53+
> I'll fetch the Apache Camel llms.txt file and follow any links to find the latest release information.
54+
>
55+
> Fetch(`https://camel.apache.org/llms.txt`)
56+
>
57+
> Let me follow the releases page link to get the specific version numbers:
58+
>
59+
> Fetch(`https://camel.apache.org/releases/index.md`)
60+
>
61+
> Fetch(`https://camel.apache.org/camel-quarkus/next/index.md`)
62+
>
63+
> Based on the Apache Camel documentation:
64+
>
65+
> **Latest Apache Camel Release:** 4.16.0 (released 2025-11-05)
66+
>
67+
> **Latest Camel Quarkus:** 3.27.x (compatible with Camel 4.14.2 and Quarkus 3.27.0)
68+
>
69+
> Note that the Camel Quarkus version (3.27.x) is currently tracking Camel 4.14.2, which is a few versions behind the latest Camel core release (4.16.0). This is typical as Camel Quarkus needs to maintain compatibility with both Apache Camel and Quarkus frameworks.
70+
71+
This demonstrates how AI assistants can autonomously navigate the llms.txt file, follow links to relevant documentation, and provide accurate, up-to-date information about Apache Camel releases.
72+
73+
### Accessing Markdown Documentation
74+
75+
For any HTML page on the Apache Camel website, a markdown version is available by appending `.md` to the URL:
76+
77+
- **HTML**: `https://camel.apache.org/components/4.10.x/languages/simple-language.html`
78+
- **Markdown**: `https://camel.apache.org/components/4.10.x/languages/simple-language.html.md`
79+
80+
## Try It Yourself
81+
82+
Visit [/llms.txt](/llms.txt) to see the implementation in action. You can access markdown versions of any documentation page by appending `.md` to the URL.
83+
84+
When using AI coding assistants like Claude Code, Cursor, or GitHub Copilot, they can now provide more accurate and up-to-date information about Apache Camel by accessing these markdown pages directly.
85+
86+
## Implementation Details
87+
88+
The implementation integrates into the existing build pipeline:
89+
90+
### Build Pipeline Integration
91+
92+
The markdown generation runs automatically during the website build process:
93+
94+
1. **Antora** generates HTML documentation from AsciiDoc sources
95+
2. **Hugo** builds the website structure
96+
3. **Gulp task** converts HTML to markdown
97+
4. Generated markdown files are deployed alongside HTML pages
98+
99+
### Content Extraction and Cleaning
100+
101+
The conversion process focuses on extracting only the essential documentation content:
102+
103+
- Removes navigation elements, headers, and footers
104+
- Extracts the main article content using semantic HTML selectors
105+
- Converts tables, code blocks, and formatting to GitHub Flavored Markdown
106+
- Preserves document structure and hierarchy
107+
108+
## Results and Impact
109+
110+
The implementation generates:
111+
112+
- **5,355+ markdown pages** automatically during each build
113+
- **Coverage** of components, languages, data formats, and user guides
114+
- **Clean, structured content** optimized for LLM consumption
115+
- **No manual maintenance required** - fully automated with the build pipeline
116+
117+
## Conclusion
118+
119+
By implementing the llms.txt specification, we've made Apache Camel documentation more accessible to both AI training pipelines and developer-facing coding assistants. This automated solution requires no manual maintenance while providing structured, clean documentation in a format optimized for LLM consumption.

0 commit comments

Comments
 (0)