Skip to content

Commit da28aee

Browse files
authored
rename and docker (#14)
1 parent eb7e32c commit da28aee

35 files changed

+786
-203
lines changed

.dockerignore

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Git
2+
.git
3+
.github
4+
.gitignore
5+
6+
# Development
7+
spec/
8+
coverage/
9+
.rspec
10+
.rubocop.yml
11+
12+
# Documentation
13+
*.md
14+
!README.md
15+
CLAUDE.md
16+
17+
# Build artifacts
18+
*.gem
19+
pkg/
20+
vendor/bundle
21+
22+
# IDE
23+
.vscode/
24+
.idea/
25+
.claude/
26+
*.swp
27+
*.swo
28+
*~
29+
30+
# Temp files
31+
tmp/
32+
*.log
33+
.DS_Store
34+
35+
# Config examples
36+
*.example
37+
llms-txt.yml
38+
config-output.txt
39+
40+
# CI
41+
.github/workflows/
42+
43+
# Ruby
44+
# Keep Gemfile.lock for reproducible builds

.github/workflows/docker.yml

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
name: Docker
2+
3+
concurrency:
4+
group: ${{ github.workflow }}-${{ github.ref }}
5+
cancel-in-progress: true
6+
7+
on:
8+
push:
9+
branches:
10+
- master
11+
tags:
12+
- 'v*'
13+
pull_request:
14+
branches:
15+
- master
16+
schedule:
17+
# Rebuild weekly to get latest base image security updates
18+
- cron: '0 2 * * 0'
19+
20+
permissions:
21+
contents: read
22+
packages: write
23+
24+
jobs:
25+
docker:
26+
runs-on: ubuntu-latest
27+
steps:
28+
- name: Checkout
29+
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
30+
with:
31+
fetch-depth: 0
32+
33+
- name: Docker meta
34+
id: meta
35+
uses: docker/metadata-action@v5
36+
with:
37+
images: |
38+
mensfeld/llm-docs-builder
39+
ghcr.io/${{ github.repository }}
40+
tags: |
41+
type=ref,event=branch
42+
type=ref,event=pr
43+
type=semver,pattern={{version}}
44+
type=semver,pattern={{major}}.{{minor}}
45+
type=semver,pattern={{major}}
46+
type=raw,value=latest,enable={{is_default_branch}}
47+
48+
- name: Set up QEMU
49+
uses: docker/setup-qemu-action@v3
50+
51+
- name: Set up Docker Buildx
52+
uses: docker/setup-buildx-action@v3
53+
54+
- name: Login to Docker Hub
55+
if: github.event_name != 'pull_request'
56+
uses: docker/login-action@v3
57+
with:
58+
username: ${{ secrets.DOCKERHUB_USERNAME }}
59+
password: ${{ secrets.DOCKERHUB_TOKEN }}
60+
61+
- name: Login to GitHub Container Registry
62+
if: github.event_name != 'pull_request'
63+
uses: docker/login-action@v3
64+
with:
65+
registry: ghcr.io
66+
username: ${{ github.actor }}
67+
password: ${{ secrets.GITHUB_TOKEN }}
68+
69+
- name: Build and push
70+
uses: docker/build-push-action@v5
71+
with:
72+
context: .
73+
platforms: linux/amd64,linux/arm64
74+
push: ${{ github.event_name != 'pull_request' }}
75+
tags: ${{ steps.meta.outputs.tags }}
76+
labels: ${{ steps.meta.outputs.labels }}
77+
cache-from: type=gha
78+
cache-to: type=gha,mode=max
79+
80+
- name: Test Docker image
81+
run: |
82+
docker run --rm ${{ fromJSON(steps.meta.outputs.json).tags[0] }} version
83+
docker run --rm ${{ fromJSON(steps.meta.outputs.json).tags[0] }} --help
84+
85+
docker-success:
86+
name: Docker Success
87+
runs-on: ubuntu-latest
88+
if: always()
89+
needs:
90+
- docker
91+
steps:
92+
- name: Check all jobs passed
93+
if: |
94+
contains(needs.*.result, 'failure') ||
95+
contains(needs.*.result, 'cancelled') ||
96+
contains(needs.*.result, 'skipped')
97+
run: exit 1
98+
- run: echo "Docker workflow completed successfully!"

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,21 @@
11
# Changelog
22

33
## Unreleased
4+
- [Breaking] **Project renamed from `llms-txt-ruby` to `llm-docs-builder`** to better reflect expanded functionality beyond just llms.txt generation.
5+
- Gem name: `llms-txt-ruby``llm-docs-builder`
6+
- Module name: `LlmsTxt``LlmDocsBuilder`
7+
- CLI command: `llms-txt``llm-docs-builder`
8+
- Config file: `llms-txt.yml``llm-docs-builder.yml`
9+
- Docker images: `mensfeld/llms-txt-ruby``mensfeld/llm-docs-builder`
10+
- Repository: `llms-txt-ruby``llm-docs-builder`
11+
- Updated all documentation, examples, and tests
12+
- [Feature] Added Docker support for easy CLI usage without Ruby installation.
13+
- Multi-stage Dockerfile for minimal image size (~78MB)
14+
- Multi-architecture support (linux/amd64, linux/arm64)
15+
- Published to Docker Hub (`mensfeld/llm-docs-builder`) and GitHub Container Registry
16+
- GitHub Actions workflow for automated Docker builds and publishing
17+
- Comprehensive Docker usage documentation with examples for all commands
18+
- CI/CD integration examples (GitHub Actions, GitLab CI, Jenkins)
419
- [Feature] Added `compare` command to measure context window savings by comparing content sizes between human and AI versions.
520
- Compare remote URL with different User-Agents (human browser vs AI bot)
621
- Compare remote URL with local markdown file

CLAUDE.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# CLAUDE.md
2+
3+
llm-docs-builder is a Ruby gem that generates [llms.txt](https://llmstxt.org/) files from existing markdown documentation and transforms markdown files to be AI-friendly. It provides both a CLI tool and Ruby API.
4+
5+
## Project Overview
6+
7+
llm-docs-builder is a Ruby gem that generates [llms.txt](https://llmstxt.org/) files from existing markdown documentation and transforms markdown files to be AI-friendly. It provides both a CLI tool and Ruby API.
8+
9+
**Key functionality:**
10+
- Generates llms.txt files from documentation directories by scanning markdown files, extracting metadata, and organizing by priority
11+
- Transforms individual markdown files by expanding relative links to absolute URLs
12+
- Bulk transforms entire documentation trees with customizable suffixes and exclusion patterns
13+
- Supports both config file and direct options for all operations
14+
15+
## Development Commands
16+
17+
### Testing
18+
```bash
19+
# Run all tests
20+
./bin/rspecs
21+
22+
# Run specific test file
23+
bundle exec rspec spec/llm_docs_builder_spec.rb
24+
25+
# Run specific test line
26+
bundle exec rspec spec/llm_docs_builder_spec.rb:42
27+
```
28+
29+
### Code Quality
30+
```bash
31+
# Run RuboCop linter
32+
bundle exec rubocop
33+
34+
# Auto-fix RuboCop violations
35+
bundle exec rubocop -a
36+
37+
# Run all checks (tests + linting)
38+
bundle exec rake
39+
```
40+
41+
### CLI Testing
42+
```bash
43+
# Test CLI locally
44+
bundle exec bin/llm-docs-builder generate --docs ./docs
45+
bundle exec bin/llm-docs-builder transform --docs README.md
46+
bundle exec bin/llm-docs-builder bulk-transform --docs ./docs
47+
48+
# Test compare command (requires network)
49+
bundle exec bin/llm-docs-builder compare --url https://karafka.io/docs/Getting-Started.html
50+
bundle exec bin/llm-docs-builder compare --url https://example.com/page.html --file docs/local.md
51+
```
52+
53+
### Building and Installing
54+
```bash
55+
# Build gem locally
56+
bundle exec rake build
57+
58+
# Install locally built gem
59+
gem install pkg/llm-docs-builder-*.gem
60+
61+
# Release (maintainers only)
62+
bundle exec rake release
63+
```
64+
65+
## Architecture
66+
67+
### Core Components
68+
69+
**LlmDocsBuilder Module** (`lib/llm_docs_builder.rb`)
70+
- Main API entry point with class methods for all operations
71+
- Uses Zeitwerk for autoloading
72+
- Delegates to specialized classes for generation, transformation, and validation
73+
- All methods support both config file and direct options via `Config#merge_with_options`
74+
75+
**Generator** (`lib/llm_docs_builder/generator.rb`)
76+
- Scans documentation directories recursively using `Find.find`
77+
- Extracts title from first H1 header, description from first paragraph
78+
- Prioritizes files: README (1), getting started (2), guides (3), tutorials (4), API (5), reference (6), others (7)
79+
- Builds formatted llms.txt with links and descriptions
80+
81+
**MarkdownTransformer** (`lib/llm_docs_builder/markdown_transformer.rb`)
82+
- Transforms individual markdown files using regex patterns
83+
- `expand_relative_links`: Converts relative links to absolute URLs using base_url
84+
- `convert_html_urls`: Changes .html/.htm URLs to .md format
85+
- Leaves absolute URLs and anchor links unchanged
86+
87+
**BulkTransformer** (`lib/llm_docs_builder/bulk_transformer.rb`)
88+
- Recursively processes all markdown files in a directory
89+
- Uses `MarkdownTransformer` for each file
90+
- Generates output paths with configurable suffix (default: `.llm`)
91+
- Empty suffix (`""`) enables in-place transformation
92+
- Supports glob-based exclusion patterns via `File.fnmatch`
93+
94+
**Comparator** (`lib/llm_docs_builder/comparator.rb`)
95+
- Measures context window savings by comparing content sizes
96+
- Fetches URLs with different User-Agents (human browser vs AI bot)
97+
- Can compare remote URL with local markdown file
98+
- Uses Net::HTTP for fetching with redirect support
99+
- Calculates reduction percentage, bytes saved, and compression factor
100+
101+
**Config** (`lib/llm_docs_builder/config.rb`)
102+
- Loads YAML config from file or auto-finds `llms-txt.yml`
103+
- Merges config file options with programmatic options (programmatic takes precedence)
104+
- Handles defaults: `suffix: '.llm'`, `output: 'llms.txt'`, `excludes: []`
105+
106+
**CLI** (`lib/llm_docs_builder/cli.rb`)
107+
- Parses commands: generate, transform, bulk-transform, compare, parse, validate, version
108+
- Uses OptionParser for flag parsing
109+
- Loads config and merges with CLI options before delegating to main module
110+
- Handles errors gracefully with user-friendly messages
111+
- Compare command displays formatted output with human-readable byte sizes (bytes/KB/MB)
112+
113+
### Configuration Precedence
114+
115+
Options are resolved in this order (highest to lowest priority):
116+
1. Direct method arguments (e.g., `LlmDocsBuilder.generate_from_docs('./docs', title: 'Override')`)
117+
2. CLI flags (e.g., `--docs ./docs`)
118+
3. Config file values (e.g., `llms-txt.yml`)
119+
4. Defaults (e.g., `suffix: '.llm'`, `output: 'llms.txt'`)
120+
121+
### File Priority System
122+
123+
When generating llms.txt, files are automatically ordered by importance:
124+
- Priority 1: README files (always listed first)
125+
- Priority 2: Getting started guides
126+
- Priority 3: General guides
127+
- Priority 4: Tutorials
128+
- Priority 5: API documentation
129+
- Priority 6: Reference documentation
130+
- Priority 7: All other files
131+
132+
### Link Transformation Logic
133+
134+
**Relative Link Expansion** (when `base_url` provided):
135+
- Converts `[text](./path.md)``[text](https://base.url/path.md)`
136+
- Converts `[text](../other.md)``[text](https://base.url/other.md)`
137+
- Skips URLs starting with `http://`, `https://`, `//`, or `#`
138+
139+
**URL Conversion** (when `convert_urls: true`):
140+
- Changes `https://example.com/page.html``https://example.com/page.md`
141+
- Changes `https://example.com/doc.htm``https://example.com/doc.md`
142+
143+
### In-Place vs Separate Files
144+
145+
**Separate Files** (`suffix: '.llm'` - default):
146+
- Creates new files: `README.md``README.llm.md`
147+
- Preserves originals for human-readable documentation
148+
- Useful for dual-serving human and AI versions
149+
150+
**In-Place** (`suffix: ""`):
151+
- Overwrites originals: `README.md``README.md` (transformed)
152+
- Used in build pipelines (e.g., Karafka framework)
153+
- Transforms documentation before deployment
154+
155+
## Testing Strategy
156+
157+
- RSpec for all tests with SimpleCov coverage tracking
158+
- Unit tests for each component in isolation
159+
- Integration tests in `spec/integrations/` for end-to-end workflows
160+
- Example outputs saved in `spec/examples.txt` for persistence
161+
- CI tests against Ruby 3.2, 3.3, 3.4 via GitHub Actions
162+
163+
## Dependencies
164+
165+
- **zeitwerk**: Autoloading and code organization
166+
- **optparse**: Built-in Ruby CLI parsing (no external CLI framework)
167+
- **rspec**: Testing framework
168+
- **rubocop**: Code linting and style enforcement
169+
- **simplecov**: Test coverage reporting
170+
171+
## Code Style
172+
173+
- Ruby 3.2+ syntax and features required
174+
- Frozen string literals in all files
175+
- Explicit module nesting (no `class Foo::Bar`)
176+
- Comprehensive YARD documentation for public APIs
177+
- Private methods clearly marked and documented
178+
- RuboCop enforces consistent style

0 commit comments

Comments
 (0)