Skip to content

Commit adff28c

Browse files
committed
feat: Add TOON format, relevance filtering, and token budget management
Major Features: - TOON format as default output (30-60% token reduction) - Token-Oriented Object Notation inspired by johannschopplich/toon - Compact tabular arrays and inline dependencies - Auto-detection from .toon file extension - Relevance filtering with multi-factor scoring - Prioritize files by keywords (--relevant flag) - Filename (10x), directory (5x), imports (3x), content (1x) weights - Smart file prioritization based on entry points and depth - Token budget management (--max-tokens flag) - Limit output to stay within AI model context windows - Automatic exclusion of lower-priority files - Detailed exclusion summary with token breakdown - Format auto-detection - Detect format from output file extension (.toon, .md, .xml) - Conflict warnings when flag and extension mismatch - Enhanced output display - Show "included/total" file counts when budget applied - Display full project tokens vs included tokens - List first 5 excluded files with token counts - Filter directory tree to show only included files Documentation: - Updated README with new features and examples - Created relevance-filtering.md documentation - Updated output-formats.md with TOON format details - Updated getting-started.md with new CLI examples - Updated index.md with feature highlights Technical: - New internal/relevance package for scoring - New internal/format/toon.go encoder - Enhanced processor with file prioritization - Comprehensive test coverage for all new features
1 parent f8fbf27 commit adff28c

29 files changed

+3780
-949
lines changed

README.md

Lines changed: 79 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,18 @@ promptext is a code context extraction tool designed for AI assistant interactio
1111

1212
## Key Features
1313

14-
- Smart file filtering with .gitignore support and intelligent defaults
15-
- Accurate token counting using tiktoken (GPT-3.5/4 compatible)
16-
- Comprehensive project analysis (entry points, configs, core files, tests, docs)
17-
- Multiple output formats (Markdown, XML)
18-
- Configurable via CLI flags or configuration files
19-
- Project metadata extraction (language, version, dependencies)
20-
- Git repository information extraction
21-
- Performance monitoring and debug logging
14+
- **TOON Format Output** - Default token-optimized format (30-60% smaller than JSON/Markdown), inspired by [johannschopplich/toon](https://github.com/johannschopplich/toon)
15+
- **Smart Relevance Filtering** - Multi-factor scoring prioritizes files by keywords (filename, directory, imports, content)
16+
- **Token Budget Management** - Limit output to specific token count, automatically excluding lower-priority files
17+
- **Format Auto-Detection** - Automatically detects output format from file extension (.toon, .md, .xml)
18+
- **Smart File Filtering** - .gitignore support and intelligent defaults
19+
- **Accurate Token Counting** - Using tiktoken (GPT-3.5/4 compatible)
20+
- **Comprehensive Project Analysis** - Entry points, configs, core files, tests, docs
21+
- **Multiple Output Formats** - TOON (default), Markdown, XML
22+
- **Flexible Configuration** - CLI flags or configuration files
23+
- **Project Metadata Extraction** - Language, version, dependencies
24+
- **Git Repository Information** - Branch, commit, message
25+
- **Performance Monitoring** - Debug logging and timing analysis
2226

2327
## Install
2428

@@ -45,7 +49,7 @@ See our [documentation](https://1broseidon.github.io/promptext/) for more instal
4549
## Basic Usage
4650

4751
```bash
48-
# Process current directory (output copied to clipboard)
52+
# Process current directory (TOON format copied to clipboard)
4953
prx
5054

5155
# Process specific directory with positional argument
@@ -57,8 +61,13 @@ prx -e .go,.js,.ts
5761
# Show project summary only
5862
prx -i
5963

60-
# Export as XML to file
61-
prx -f xml -o project.xml
64+
# Auto-detect format from file extension
65+
prx -o context.toon # TOON format
66+
prx -o context.md # Markdown format
67+
prx -o project.xml # XML format
68+
69+
# Explicit format specification
70+
prx -f markdown -o context.md
6271

6372
# Process with custom exclusions and view output in terminal
6473
prx -x "test/,vendor/" --verbose
@@ -70,6 +79,58 @@ prx --dry-run -e .go
7079
prx -q -o output.md
7180
```
7281

82+
## Advanced Features
83+
84+
### Relevance Filtering
85+
86+
Prioritize files matching specific keywords using multi-factor scoring:
87+
88+
```bash
89+
# Prioritize authentication-related files
90+
prx --relevant "auth login OAuth"
91+
prx -r "database SQL postgres"
92+
93+
# Multi-factor scoring weights:
94+
# - Filename matches: 10x
95+
# - Directory matches: 5x
96+
# - Import matches: 3x
97+
# - Content matches: 1x
98+
```
99+
100+
### Token Budget Management
101+
102+
Limit output to stay within token limits for AI models:
103+
104+
```bash
105+
# Limit to 8000 tokens (fits Claude Haiku context)
106+
prx --max-tokens 8000
107+
108+
# Combine with relevance to prioritize important files
109+
prx -r "api routes handlers" --max-tokens 5000
110+
111+
# Cost-optimized queries
112+
prx --max-tokens 3000 -o quick-context.toon
113+
```
114+
115+
When the budget is exceeded, promptext:
116+
- Shows which files were included vs excluded
117+
- Displays token breakdown for excluded files
118+
- Filters directory tree to show only included files
119+
120+
```
121+
╭───────────────────────────────────────────────╮
122+
│ 📦 promptext (Go) │
123+
│ Included: 7/18 files • ~4,847 tokens │
124+
│ Full project: 18 files • ~19,512 tokens │
125+
╰───────────────────────────────────────────────╯
126+
127+
⚠️ Excluded 11 files due to token budget:
128+
• internal/cli/commands.go (~784 tokens)
129+
• internal/app/app.go (~60 tokens)
130+
... and 9 more files (~8,453 tokens)
131+
Total excluded: ~9,297 tokens
132+
```
133+
73134
## Configuration
74135

75136
Configuration is loaded with the following precedence (highest to lowest):
@@ -90,7 +151,7 @@ excludes:
90151
- vendor/
91152
- node_modules/
92153
- "*.test.go"
93-
format: markdown
154+
format: toon # Options: toon, markdown, xml
94155
verbose: false
95156
```
96157
@@ -106,7 +167,7 @@ extensions:
106167
excludes:
107168
- vendor/
108169
- __pycache__/
109-
format: markdown
170+
format: toon
110171
```
111172

112173
## Documentation
@@ -115,10 +176,12 @@ Visit our [documentation site](https://1broseidon.github.io/promptext/) for comp
115176

116177
- Getting Started Guide
117178
- Configuration Options
118-
- File Filtering Rules
119-
- Token Analysis
179+
- File Filtering Rules
180+
- **Relevance Filtering** - Smart file prioritization
181+
- **Token Budget Management** - Optimize for AI model context windows
182+
- Token Analysis & Counting
120183
- Project Analysis Features
121-
- Output Format Specifications
184+
- Output Format Specifications (TOON, Markdown, XML)
122185
- Performance Tips
123186

124187
## License

cmd/promptext/main.go

Lines changed: 60 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ import (
44
"fmt"
55
"log"
66
"os"
7+
"path/filepath"
8+
"strings"
79

810
"github.com/1broseidon/promptext/internal/processor"
911
"github.com/spf13/pflag"
@@ -43,7 +45,7 @@ FILTERING OPTIONS:
4345
Examples: vendor/,node_modules/ or *.test.go,dist/
4446
4547
OUTPUT OPTIONS:
46-
-f, --format FORMAT Output format: markdown, md, xml (default: markdown)
48+
-f, --format FORMAT Output format: toon, markdown, md, xml (default: toon for AI-optimized structure)
4749
-o, --output FILE Write output to file instead of clipboard
4850
-n, --no-copy Don't copy output to clipboard
4951
-i, --info Show only project summary (no file contents)
@@ -53,6 +55,12 @@ PROCESSING OPTIONS:
5355
--dry-run Preview files that would be processed without reading content
5456
-q, --quiet Suppress non-essential output for scripting
5557
58+
RELEVANCE & TOKEN BUDGET:
59+
-r, --relevant KEYWORDS Keywords to prioritize files (comma or space separated)
60+
Uses multi-factor scoring: filename (10x), directory (5x), imports (3x), content (1x)
61+
--max-tokens NUMBER Maximum token budget for output (excludes lower-priority files when exceeded)
62+
Works best with --relevant to prioritize important files first
63+
5664
DEBUG OPTIONS:
5765
-D, --debug Enable debug logging and timing information
5866
-h, --help Show this help message
@@ -71,6 +79,9 @@ EXAMPLES:
7179
# Export specific file types to XML with debug info
7280
prx -e .js,.ts,.json -f xml -o project.xml -D
7381
82+
# Use TOON format for AI-optimized structure (better scannability)
83+
prx -f toon -o project.toon
84+
7485
# Process with custom exclusions and see output in terminal
7586
prx -x "vendor/,*.test.go,dist/" -v
7687
@@ -86,6 +97,19 @@ EXAMPLES:
8697
# Quiet mode for use in scripts (minimal output)
8798
prx -q -f xml -o output.xml
8899
100+
# Auto-detect format from output file extension
101+
prx -o context.toon # Automatically uses TOON format
102+
prx -o context.md # Automatically uses markdown format
103+
104+
# Prioritize authentication-related files
105+
prx --relevant "auth login OAuth"
106+
107+
# Limit output to 8000 tokens, prioritizing database files
108+
prx --relevant "database" --max-tokens 8000
109+
110+
# Combined: relevant files with token budget
111+
prx -r "api routes handlers" --max-tokens 5000 -o api-context.toon
112+
89113
CONFIGURATION:
90114
Create a .promptext.yml file in your project root for persistent settings:
91115
@@ -96,7 +120,7 @@ CONFIGURATION:
96120
excludes:
97121
- vendor/
98122
- node_modules/
99-
format: markdown
123+
format: toon
100124
verbose: false
101125
102126
CLI flags override configuration file settings.
@@ -131,7 +155,7 @@ func main() {
131155
exclude := pflag.StringP("exclude", "x", "", "Patterns to exclude (comma-separated, e.g., vendor/,*.test.go)")
132156

133157
// Output options
134-
format := pflag.StringP("format", "f", "markdown", "Output format: markdown, md, or xml")
158+
format := pflag.StringP("format", "f", "toon", "Output format: toon, markdown, md, or xml (default: toon)")
135159
outFile := pflag.StringP("output", "o", "", "Write output to file instead of clipboard")
136160
noCopy := pflag.BoolP("no-copy", "n", false, "Don't copy output to clipboard")
137161
infoOnly := pflag.BoolP("info", "i", false, "Show only project summary without file contents")
@@ -141,6 +165,11 @@ func main() {
141165
dryRun := pflag.Bool("dry-run", false, "Preview files that would be processed without reading content")
142166
quiet := pflag.BoolP("quiet", "q", false, "Suppress non-essential output for scripting")
143167

168+
// Relevance and token budget options
169+
relevant := pflag.StringP("relevant", "r", "", "Keywords to prioritize files (comma or space separated, multi-factor scoring)")
170+
maxTokens := pflag.Int("max-tokens", 0, "Maximum token budget for output (excludes lower-priority files when exceeded)")
171+
explainSelection := pflag.Bool("explain-selection", false, "Show detailed priority scoring breakdown for file selection")
172+
144173
// Debug options
145174
debug := pflag.BoolP("debug", "D", false, "Enable debug logging and timing information")
146175

@@ -162,7 +191,34 @@ func main() {
162191
*dirPath = args[0]
163192
}
164193

165-
if err := processor.Run(*dirPath, *extension, *exclude, *noCopy, *infoOnly, *verbose, *format, *outFile, *debug, *gitignore, *useDefaultRules, *dryRun, *quiet); err != nil {
194+
// Format auto-detection from output file extension
195+
if *outFile != "" {
196+
ext := strings.ToLower(filepath.Ext(*outFile))
197+
detectedFormat := ""
198+
switch ext {
199+
case ".toon":
200+
detectedFormat = "toon"
201+
case ".md", ".markdown":
202+
detectedFormat = "markdown"
203+
case ".xml":
204+
detectedFormat = "xml"
205+
}
206+
207+
// Check for format conflict and warn
208+
if detectedFormat != "" && *format != detectedFormat {
209+
// User explicitly set format flag
210+
formatFlag := pflag.Lookup("format")
211+
if formatFlag.Changed {
212+
// Warn about conflict
213+
fmt.Fprintf(os.Stderr, "⚠️ Warning: format flag '%s' conflicts with output extension '%s' - using '%s' (flag takes precedence)\n", *format, ext, *format)
214+
} else {
215+
// Auto-detect format from extension since flag wasn't explicitly set
216+
*format = detectedFormat
217+
}
218+
}
219+
}
220+
221+
if err := processor.Run(*dirPath, *extension, *exclude, *noCopy, *infoOnly, *verbose, *format, *outFile, *debug, *gitignore, *useDefaultRules, *dryRun, *quiet, *relevant, *maxTokens, *explainSelection); err != nil {
166222
log.Fatal(err)
167223
}
168224
}

docs-astro/src/content/docs/getting-started.md

Lines changed: 81 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,22 @@ Download binaries from [GitHub Releases](https://github.com/1broseidon/promptext
3434
### Simple Commands
3535

3636
```bash
37-
# Process current directory
37+
# Process current directory (TOON format to clipboard)
3838
promptext
3939

40+
# Use alias for convenience
41+
prx
42+
4043
# Process specific directory
4144
promptext -d /path/to/project
4245

4346
# Show project overview only
44-
promptext -info
47+
promptext -i
4548

46-
# Export to file
47-
promptext -o output.md
49+
# Export to file (format auto-detected from extension)
50+
promptext -o context.toon
51+
promptext -o context.md
52+
promptext -o project.xml
4853
```
4954

5055
### Common Options
@@ -54,10 +59,13 @@ promptext -o output.md
5459
| `-d` | Directory to process |
5560
| `-e` | File extensions (`.go,.js`) |
5661
| `-x` | Exclude patterns |
57-
| `-f` | Format (`markdown`, `xml`) |
58-
| `-o` | Output file |
62+
| `-f` | Format (`toon`, `markdown`, `xml`) |
63+
| `-o` | Output file (auto-detects format) |
5964
| `-i` | Info mode only |
65+
| `-r` | Relevant keywords for prioritization |
66+
| `--max-tokens` | Token budget limit |
6067
| `-v` | Verbose output |
68+
| `-q` | Quiet mode for scripting |
6169

6270
### Examples
6371

@@ -71,9 +79,74 @@ promptext -e .go,.js,.ts
7179
promptext -x "node_modules/,vendor/,test/"
7280
```
7381

74-
**Generate XML report:**
82+
**Generate reports:**
7583
```bash
84+
# TOON format (default, token-optimized)
85+
promptext -o context.toon
86+
87+
# Markdown format
88+
promptext -f markdown -o context.md
89+
90+
# XML format for automation
7691
promptext -f xml -o report.xml
7792
```
7893

79-
Continue with [Configuration](configuration) to customize behavior.
94+
**Prioritize relevant files:**
95+
```bash
96+
# Focus on authentication code
97+
promptext -r "auth login OAuth"
98+
99+
# Database-related files
100+
promptext -r "database SQL migration"
101+
```
102+
103+
**Stay within token budgets:**
104+
```bash
105+
# Limit to 8000 tokens (Claude Haiku)
106+
promptext --max-tokens 8000
107+
108+
# Combine with relevance for smart selection
109+
promptext -r "api routes" --max-tokens 5000
110+
```
111+
112+
## Quick Workflows
113+
114+
### For AI Queries
115+
116+
```bash
117+
# Quick context (3k tokens)
118+
prx -r "auth" --max-tokens 3000
119+
120+
# Standard context (8k tokens)
121+
prx -r "api database" --max-tokens 8000
122+
123+
# Full codebase (within limits)
124+
prx --max-tokens 50000
125+
```
126+
127+
### For Documentation
128+
129+
```bash
130+
# Export project overview
131+
prx -i -o overview.md
132+
133+
# Export full context in markdown
134+
prx -f markdown -o full-context.md
135+
```
136+
137+
### For CI/CD
138+
139+
```bash
140+
# Machine-readable XML
141+
prx -f xml -o build/context.xml
142+
143+
# Quiet mode for scripting
144+
prx -q -o context.toon
145+
```
146+
147+
## Next Steps
148+
149+
- [Output Formats](output-formats) - TOON, Markdown, and XML formats
150+
- [Relevance Filtering](relevance-filtering) - Smart file prioritization
151+
- [Configuration](configuration) - Customize behavior
152+
- [File Filtering](file-filtering) - Advanced filtering rules

0 commit comments

Comments
 (0)