|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +llm-docs-builder is a Ruby gem that generates [llms.txt](https://llmstxt.org/) files from existing markdown documentation and transforms markdown files to be AI-friendly. It provides both a CLI tool and Ruby API. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +llm-docs-builder is a Ruby gem that generates [llms.txt](https://llmstxt.org/) files from existing markdown documentation and transforms markdown files to be AI-friendly. It provides both a CLI tool and Ruby API. |
| 8 | + |
| 9 | +**Key functionality:** |
| 10 | +- Generates llms.txt files from documentation directories by scanning markdown files, extracting metadata, and organizing by priority |
| 11 | +- Transforms individual markdown files by expanding relative links to absolute URLs |
| 12 | +- Bulk transforms entire documentation trees with customizable suffixes and exclusion patterns |
| 13 | +- Supports both config file and direct options for all operations |
| 14 | + |
| 15 | +## Development Commands |
| 16 | + |
| 17 | +### Testing |
| 18 | +```bash |
| 19 | +# Run all tests |
| 20 | +./bin/rspecs |
| 21 | + |
| 22 | +# Run specific test file |
| 23 | +bundle exec rspec spec/llm_docs_builder_spec.rb |
| 24 | + |
| 25 | +# Run specific test line |
| 26 | +bundle exec rspec spec/llm_docs_builder_spec.rb:42 |
| 27 | +``` |
| 28 | + |
| 29 | +### Code Quality |
| 30 | +```bash |
| 31 | +# Run RuboCop linter |
| 32 | +bundle exec rubocop |
| 33 | + |
| 34 | +# Auto-fix RuboCop violations |
| 35 | +bundle exec rubocop -a |
| 36 | + |
| 37 | +# Run all checks (tests + linting) |
| 38 | +bundle exec rake |
| 39 | +``` |
| 40 | + |
| 41 | +### CLI Testing |
| 42 | +```bash |
| 43 | +# Test CLI locally |
| 44 | +bundle exec bin/llm-docs-builder generate --docs ./docs |
| 45 | +bundle exec bin/llm-docs-builder transform --docs README.md |
| 46 | +bundle exec bin/llm-docs-builder bulk-transform --docs ./docs |
| 47 | + |
| 48 | +# Test compare command (requires network) |
| 49 | +bundle exec bin/llm-docs-builder compare --url https://karafka.io/docs/Getting-Started.html |
| 50 | +bundle exec bin/llm-docs-builder compare --url https://example.com/page.html --file docs/local.md |
| 51 | +``` |
| 52 | + |
| 53 | +### Building and Installing |
| 54 | +```bash |
| 55 | +# Build gem locally |
| 56 | +bundle exec rake build |
| 57 | + |
| 58 | +# Install locally built gem |
| 59 | +gem install pkg/llm-docs-builder-*.gem |
| 60 | + |
| 61 | +# Release (maintainers only) |
| 62 | +bundle exec rake release |
| 63 | +``` |
| 64 | + |
| 65 | +## Architecture |
| 66 | + |
| 67 | +### Core Components |
| 68 | + |
| 69 | +**LlmDocsBuilder Module** (`lib/llm_docs_builder.rb`) |
| 70 | +- Main API entry point with class methods for all operations |
| 71 | +- Uses Zeitwerk for autoloading |
| 72 | +- Delegates to specialized classes for generation, transformation, and validation |
| 73 | +- All methods support both config file and direct options via `Config#merge_with_options` |
| 74 | + |
| 75 | +**Generator** (`lib/llm_docs_builder/generator.rb`) |
| 76 | +- Scans documentation directories recursively using `Find.find` |
| 77 | +- Extracts title from first H1 header, description from first paragraph |
| 78 | +- Prioritizes files: README (1), getting started (2), guides (3), tutorials (4), API (5), reference (6), others (7) |
| 79 | +- Builds formatted llms.txt with links and descriptions |
| 80 | + |
| 81 | +**MarkdownTransformer** (`lib/llm_docs_builder/markdown_transformer.rb`) |
| 82 | +- Transforms individual markdown files using regex patterns |
| 83 | +- `expand_relative_links`: Converts relative links to absolute URLs using base_url |
| 84 | +- `convert_html_urls`: Changes .html/.htm URLs to .md format |
| 85 | +- Leaves absolute URLs and anchor links unchanged |
| 86 | + |
| 87 | +**BulkTransformer** (`lib/llm_docs_builder/bulk_transformer.rb`) |
| 88 | +- Recursively processes all markdown files in a directory |
| 89 | +- Uses `MarkdownTransformer` for each file |
| 90 | +- Generates output paths with configurable suffix (default: `.llm`) |
| 91 | +- Empty suffix (`""`) enables in-place transformation |
| 92 | +- Supports glob-based exclusion patterns via `File.fnmatch` |
| 93 | + |
| 94 | +**Comparator** (`lib/llm_docs_builder/comparator.rb`) |
| 95 | +- Measures context window savings by comparing content sizes |
| 96 | +- Fetches URLs with different User-Agents (human browser vs AI bot) |
| 97 | +- Can compare remote URL with local markdown file |
| 98 | +- Uses Net::HTTP for fetching with redirect support |
| 99 | +- Calculates reduction percentage, bytes saved, and compression factor |
| 100 | + |
| 101 | +**Config** (`lib/llm_docs_builder/config.rb`) |
| 102 | +- Loads YAML config from file or auto-finds `llms-txt.yml` |
| 103 | +- Merges config file options with programmatic options (programmatic takes precedence) |
| 104 | +- Handles defaults: `suffix: '.llm'`, `output: 'llms.txt'`, `excludes: []` |
| 105 | + |
| 106 | +**CLI** (`lib/llm_docs_builder/cli.rb`) |
| 107 | +- Parses commands: generate, transform, bulk-transform, compare, parse, validate, version |
| 108 | +- Uses OptionParser for flag parsing |
| 109 | +- Loads config and merges with CLI options before delegating to main module |
| 110 | +- Handles errors gracefully with user-friendly messages |
| 111 | +- Compare command displays formatted output with human-readable byte sizes (bytes/KB/MB) |
| 112 | + |
| 113 | +### Configuration Precedence |
| 114 | + |
| 115 | +Options are resolved in this order (highest to lowest priority): |
| 116 | +1. Direct method arguments (e.g., `LlmDocsBuilder.generate_from_docs('./docs', title: 'Override')`) |
| 117 | +2. CLI flags (e.g., `--docs ./docs`) |
| 118 | +3. Config file values (e.g., `llms-txt.yml`) |
| 119 | +4. Defaults (e.g., `suffix: '.llm'`, `output: 'llms.txt'`) |
| 120 | + |
| 121 | +### File Priority System |
| 122 | + |
| 123 | +When generating llms.txt, files are automatically ordered by importance: |
| 124 | +- Priority 1: README files (always listed first) |
| 125 | +- Priority 2: Getting started guides |
| 126 | +- Priority 3: General guides |
| 127 | +- Priority 4: Tutorials |
| 128 | +- Priority 5: API documentation |
| 129 | +- Priority 6: Reference documentation |
| 130 | +- Priority 7: All other files |
| 131 | + |
| 132 | +### Link Transformation Logic |
| 133 | + |
| 134 | +**Relative Link Expansion** (when `base_url` provided): |
| 135 | +- Converts `[text](./path.md)` → `[text](https://base.url/path.md)` |
| 136 | +- Converts `[text](../other.md)` → `[text](https://base.url/other.md)` |
| 137 | +- Skips URLs starting with `http://`, `https://`, `//`, or `#` |
| 138 | + |
| 139 | +**URL Conversion** (when `convert_urls: true`): |
| 140 | +- Changes `https://example.com/page.html` → `https://example.com/page.md` |
| 141 | +- Changes `https://example.com/doc.htm` → `https://example.com/doc.md` |
| 142 | + |
| 143 | +### In-Place vs Separate Files |
| 144 | + |
| 145 | +**Separate Files** (`suffix: '.llm'` - default): |
| 146 | +- Creates new files: `README.md` → `README.llm.md` |
| 147 | +- Preserves originals for human-readable documentation |
| 148 | +- Useful for dual-serving human and AI versions |
| 149 | + |
| 150 | +**In-Place** (`suffix: ""`): |
| 151 | +- Overwrites originals: `README.md` → `README.md` (transformed) |
| 152 | +- Used in build pipelines (e.g., Karafka framework) |
| 153 | +- Transforms documentation before deployment |
| 154 | + |
| 155 | +## Testing Strategy |
| 156 | + |
| 157 | +- RSpec for all tests with SimpleCov coverage tracking |
| 158 | +- Unit tests for each component in isolation |
| 159 | +- Integration tests in `spec/integrations/` for end-to-end workflows |
| 160 | +- Example outputs saved in `spec/examples.txt` for persistence |
| 161 | +- CI tests against Ruby 3.2, 3.3, 3.4 via GitHub Actions |
| 162 | + |
| 163 | +## Dependencies |
| 164 | + |
| 165 | +- **zeitwerk**: Autoloading and code organization |
| 166 | +- **optparse**: Built-in Ruby CLI parsing (no external CLI framework) |
| 167 | +- **rspec**: Testing framework |
| 168 | +- **rubocop**: Code linting and style enforcement |
| 169 | +- **simplecov**: Test coverage reporting |
| 170 | + |
| 171 | +## Code Style |
| 172 | + |
| 173 | +- Ruby 3.2+ syntax and features required |
| 174 | +- Frozen string literals in all files |
| 175 | +- Explicit module nesting (no `class Foo::Bar`) |
| 176 | +- Comprehensive YARD documentation for public APIs |
| 177 | +- Private methods clearly marked and documented |
| 178 | +- RuboCop enforces consistent style |
0 commit comments