-
-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Description
Problem
When processing large directories, it's hard to know which files and subdirectories are contributing the most to the output size. This makes it difficult to optimize what to include/exclude for LLM context windows.
Proposed Solution
Add a --verbose
or -v
flag that shows a summary after processing, including:
- Number of files processed vs ignored
- Total size in tokens (estimated)
- Top N largest files by token count
- Token counts by subdirectory
- Total lines processed
Example
files-to-prompt . --verbose
Would output the normal content, followed by:
Top 20 file size token count breakdown:
17,563 engine/analysis.py
15,036 ui/run_benchmark_tab.py
11,975 tests/test_benchmark.py
10,719 engine/call_llm.py
10,014 docs/future_work/litellm.txt
9,254 tests/test_cli_batch.py
9,109 engine/io_xlsx_export.py
9,081 ui/app.py
8,513 engine/rate_limiter.py
6,063 tests/test_analysis.py
5,921 cli/generate_report.py
5,855 tests/test_rate_limiter.py
5,836 docs/ui_refactor_plan.md
5,296 tests/test_integration.py
5,185 tests_nodeids.txt
4,459 tests/test_llm_integration.py
4,011 cli/cli.py
3,851 aibo_mcp_server.py
3,754 tests/fixtures.py
3,688 tests/test_ui.py
First-level subdirectories:
tests: 62,835 tokens
engine: 62,082 tokens
ui: 25,627 tokens
docs: 15,850 tokens
cli: 15,701 tokens
root: 11,708 tokens
cosinesim: 8,361 tokens
.streamlit: 155 tokens
.claude: 95 tokens
.cursor: 63 tokens
Total number of files processed: 69
Total tokens in all files: 202,477
Total lines: 21,363
Implementation Notes
- Token would use tiktoken or fall back on a simple approximation (chars/3)
- Would you want summary to go to stderr so it doesn't interfere with piped output?
Use Case
- When the output is too big, easily identify how to trim
- Note surprisingly large files & subdirectories to omit
Metadata
Metadata
Assignees
Labels
No labels