Skip to content

Commit 1557c09

Browse files
authored
align to llms.txt (#27)
1 parent fed840b commit 1557c09

File tree

11 files changed

+584
-29
lines changed

11 files changed

+584
-29
lines changed

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Changelog
22

3+
## 0.10.0 (2025-10-27)
4+
- [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
5+
- **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
6+
- **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
7+
- **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
8+
- **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
9+
- Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
10+
- Empty sections are automatically omitted from output
11+
- Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
12+
- Fixed multiline regex greedy matching issue that was capturing only one link per section
13+
- [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
14+
- [Docs] Updated README with multi-section organization examples and body content usage
15+
- **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
16+
317
## 0.9.4 (2025-10-27)
418
- [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
519
- Adds `include_hidden: false` as default behavior

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PATH
22
remote: .
33
specs:
4-
llm-docs-builder (0.9.4)
4+
llm-docs-builder (0.10.0)
55
zeitwerk (~> 2.6)
66

77
GEM

README.md

Lines changed: 66 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ docs: ./docs
109109
base_url: https://myproject.io
110110
title: My Project
111111
description: Brief description
112+
body: Optional body content between description and sections
112113
output: llms.txt
113114
suffix: .llm
114115
verbose: false
@@ -289,9 +290,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
289290
290291
**Enhanced llms.txt (with metadata enabled):**
291292
```markdown
292-
- [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
293-
- [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
294-
- [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
293+
- [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
294+
- [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
295+
- [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
295296
```
296297
297298
**Benefits:**
@@ -309,6 +310,68 @@ include_priority: true # Show priority labels (high/medium/low)
309310
calculate_compression: true # Show compression ratios (slower, requires transformation)
310311
```
311312
313+
**Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
314+
315+
### Multi-Section Organization
316+
317+
Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
318+
319+
**Priority-based categorization:**
320+
- **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
321+
- **Examples** (priority 4-5): Tutorials and example files
322+
- **Optional** (priority 6-7): Advanced topics and reference documentation
323+
324+
**Example output:**
325+
```markdown
326+
# My Project
327+
328+
> Project description
329+
330+
## Documentation
331+
332+
- [README](README.md): Main documentation
333+
- [Getting Started](getting-started.md): Quick start guide
334+
335+
## Examples
336+
337+
- [Basic Tutorial](tutorial.md): Step-by-step tutorial
338+
- [Code Examples](examples.md): Example code
339+
340+
## Optional
341+
342+
- [Advanced Topics](advanced.md): Deep dive into advanced features
343+
- [API Reference](reference.md): Complete API reference
344+
```
345+
346+
Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
347+
348+
### Body Content
349+
350+
Add custom body content between the description and documentation sections:
351+
352+
```yaml
353+
# llm-docs-builder.yml
354+
title: My Project
355+
description: Brief description
356+
body: |
357+
This framework is built on Ruby and focuses on performance.
358+
Key concepts: streaming, batching, and parallel processing.
359+
docs: ./docs
360+
```
361+
362+
This produces:
363+
```markdown
364+
# My Project
365+
366+
> Brief description
367+
368+
This framework is built on Ruby and focuses on performance.
369+
Key concepts: streaming, batching, and parallel processing.
370+
371+
## Documentation
372+
...
373+
```
374+
312375
## Advanced Compression Options
313376

314377
All compression features can be used individually for fine-grained control:

lib/llm_docs_builder/config.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ def merge_with_options(options)
6161
base_url: options[:base_url] || self['base_url'],
6262
title: options[:title] || self['title'],
6363
description: options[:description] || self['description'],
64+
body: options[:body] || self['body'],
6465
output: options[:output] || self['output'] || 'llms.txt',
6566
convert_urls: if options.key?(:convert_urls)
6667
options[:convert_urls]

lib/llm_docs_builder/generator.rb

Lines changed: 54 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,11 @@ def apply_transformations(content, file_path)
210210

211211
# Constructs llms.txt content from analyzed documentation files
212212
#
213-
# Combines title, description, and documentation links into formatted output
213+
# Combines title, description, body content, and documentation links into formatted output.
214+
# Organizes documents into sections based on priority:
215+
# - Priority 1-3: Documentation (essential docs like README, getting started)
216+
# - Priority 4-5: Examples (tutorials, example files)
217+
# - Priority 6-7: Optional (advanced topics, reference docs)
214218
#
215219
# @param docs [Array<Hash>] analyzed file metadata
216220
# @return [String] formatted llms.txt content
@@ -224,31 +228,60 @@ def build_llms_txt(docs)
224228
content << "> #{description}" if description
225229
content << ''
226230

227-
if docs.any?
228-
content << '## Documentation'
231+
# Add optional body content
232+
if options[:body] && !options[:body].empty?
233+
content << options[:body]
229234
content << ''
235+
end
230236

231-
docs.each do |doc|
232-
url = build_url(doc[:path])
233-
line = if doc[:description] && !doc[:description].empty?
234-
"- [#{doc[:title]}](#{url}): #{doc[:description]}"
235-
else
236-
"- [#{doc[:title]}](#{url})"
237-
end
238-
239-
# Append metadata if enabled
240-
if options[:include_metadata]
241-
metadata_parts = []
242-
metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
243-
metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
244-
metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
245-
metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
246-
247-
line += " #{metadata_parts.join(' ')}" unless metadata_parts.empty?
237+
if docs.any?
238+
# Categorize docs by priority into sections
239+
sections = {
240+
'Documentation' => docs.select { |d| d[:priority] <= 3 },
241+
'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
242+
'Optional' => docs.select { |d| d[:priority] >= 6 }
243+
}
244+
245+
# Build each section (skip empty ones)
246+
sections.each do |section_name, section_docs|
247+
next if section_docs.empty?
248+
249+
content << "## #{section_name}"
250+
content << ''
251+
252+
section_docs.each do |doc|
253+
url = build_url(doc[:path])
254+
255+
# Build metadata string if enabled
256+
metadata_str = nil
257+
if options[:include_metadata]
258+
metadata_parts = []
259+
metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
260+
metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
261+
metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
262+
metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
263+
264+
metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
265+
end
266+
267+
# Build line according to spec: - [title](url): description (metadata)
268+
line = if doc[:description] && !doc[:description].empty?
269+
base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
270+
metadata_str ? "#{base} #{metadata_str}" : base
271+
else
272+
# No description: - [title](url) (metadata)
273+
base = "- [#{doc[:title]}](#{url})"
274+
metadata_str ? "#{base}: #{metadata_str}" : base
275+
end
276+
277+
content << line
248278
end
249279

250-
content << line
280+
content << ''
251281
end
282+
283+
# Remove trailing empty line
284+
content.pop if content.last == ''
252285
end
253286

254287
"#{content.join("\n")}\n"

lib/llm_docs_builder/parser.rb

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,19 +85,21 @@ def save_section(sections, section_name, content)
8585

8686
# Extracts markdown links from section content into structured format
8787
#
88-
# Scans for markdown list items with links and descriptions. Returns raw content
88+
# Scans for markdown list items with links and optional descriptions. Returns raw content
8989
# if no links are found in the expected format.
9090
#
9191
# @param content [String] raw section content
9292
# @return [Array<Hash>, String] array of link hashes or raw content if no links found
9393
def parse_section_content(content)
9494
links = []
9595

96-
content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m) do |title, url, description|
96+
# Updated regex: description is optional (non-capturing group with ?)
97+
# Use [^\n]* instead of .* to avoid matching across lines
98+
content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
9799
links << {
98100
title: title,
99101
url: url,
100-
description: description.strip
102+
description: description&.strip || ''
101103
}
102104
end
103105

lib/llm_docs_builder/version.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22

33
module LlmDocsBuilder
44
# Current version of the LlmDocsBuilder gem
5-
VERSION = '0.9.4'
5+
VERSION = '0.10.0'
66
end

spec/config_spec.rb

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,34 @@
5151
end
5252
end
5353

54+
describe '#merge_with_options' do
55+
it 'includes body parameter from config file' do
56+
config_file = Tempfile.new(['config', '.yml'])
57+
config_file.write(YAML.dump({ 'body' => 'Custom body content from config' }))
58+
config_file.close
59+
60+
config = described_class.new(config_file.path)
61+
merged = config.merge_with_options({})
62+
63+
expect(merged[:body]).to eq('Custom body content from config')
64+
65+
config_file.unlink
66+
end
67+
68+
it 'allows CLI options to override body parameter' do
69+
config_file = Tempfile.new(['config', '.yml'])
70+
config_file.write(YAML.dump({ 'body' => 'Config body' }))
71+
config_file.close
72+
73+
config = described_class.new(config_file.path)
74+
merged = config.merge_with_options({ body: 'CLI body override' })
75+
76+
expect(merged[:body]).to eq('CLI body override')
77+
78+
config_file.unlink
79+
end
80+
end
81+
5482
describe 'error handling' do
5583
it 'raises GenerationError for invalid YAML syntax' do
5684
config_file = Tempfile.new(['config', '.yml'])

0 commit comments

Comments
 (0)