Skip to content

Commit 7b7643d

Browse files
committed
feat: add token counting, markdown format, and YAML escaping fixes
- Add token counting module with tiktoken support and fallback approximation - Add o200k_harmony encoding for newer models - Add warning when --token-encoding used without --tokens - Fix YAML escaping for \n, \r, \0, \x85, \u2028, \u2029 in filenames - Add markdown output format with language-aware code fences - Add comprehensive tests for tokens (23), markdown (56), YAML escaping (11)
1 parent 7b087b8 commit 7b7643d

26 files changed

+1684
-347
lines changed

.github/workflows/cd.yml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ jobs:
4040

4141
- name: Check that we're on main branch
4242
run: |
43-
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
43+
# Use GitHub context instead of git command to avoid detached HEAD issues
44+
CURRENT_BRANCH="${{ github.ref_name }}"
4445
if [ "$CURRENT_BRANCH" != "main" ]; then
4546
echo "Error: Releases can only be created from the main branch. Current branch: $CURRENT_BRANCH"
4647
exit 1
@@ -87,6 +88,20 @@ jobs:
8788
echo "Commit SHA: $COMMIT_SHA"
8889
echo "commit_sha=$COMMIT_SHA" >> $GITHUB_OUTPUT
8990
91+
- name: Check tag doesn't already exist
92+
run: |
93+
TAG="v${{ github.event.inputs.version }}"
94+
git fetch --tags origin 2>/dev/null || true
95+
if git rev-parse "$TAG" >/dev/null 2>&1; then
96+
echo "Error: Tag $TAG already exists locally"
97+
exit 1
98+
fi
99+
if git ls-remote --tags origin | grep -q "refs/tags/$TAG$"; then
100+
echo "Error: Tag $TAG already exists on remote"
101+
exit 1
102+
fi
103+
echo "Tag $TAG does not exist, proceeding..."
104+
90105
- name: Create local tag (no push yet)
91106
run: |
92107
git tag -a "v${{ github.event.inputs.version }}" -m "Release version ${{ github.event.inputs.version }}"

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ repos:
2828
# Order matters: isort must run before black to avoid conflicts
2929
# ============================================================================
3030
- repo: https://github.com/pycqa/isort
31-
rev: 7.0.0
31+
rev: 5.13.2
3232
hooks:
3333
- id: isort
3434

CLAUDE.md

Lines changed: 37 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -39,31 +39,49 @@ children:
3939
## Usage
4040
4141
```bash
42-
treemapper . # YAML to stdout
42+
treemapper . # YAML to stdout + token count
4343
treemapper . -o tree.yaml # save to file
44+
treemapper . -o # save to tree.yaml (default filename)
4445
treemapper . -o - # explicit stdout output
45-
treemapper . --format json # JSON format
46-
treemapper . --format text # tree-style text
46+
treemapper . -f json # JSON format
47+
treemapper . -f txt # plain text with indentation
48+
treemapper . -f md # Markdown with headings and fenced code blocks
49+
treemapper . -f yml # YAML format (alias for yaml)
4750
treemapper . --no-content # structure only (no file contents)
4851
treemapper . --max-depth 3 # limit directory depth
4952
treemapper . --max-file-bytes 10000 # skip files larger than 10KB
53+
treemapper . --max-file-bytes 0 # no limit (include all files)
5054
treemapper . -i custom.ignore # custom ignore patterns
5155
treemapper . --no-default-ignores # disable .gitignore/.treemapperignore (custom -i still works)
52-
treemapper . -v 2 # verbose output (0=ERROR, 1=WARNING, 2=INFO, 3=DEBUG)
53-
treemapper . -c # copy output to clipboard (also outputs to stdout)
54-
treemapper . --copy-only # copy to clipboard only (no stdout output)
55-
treemapper --version # show version
56+
treemapper . --log-level info # log level (error/warning/info/debug)
57+
treemapper . -c # copy to clipboard (no stdout)
58+
treemapper . -c -o tree.yaml # copy to clipboard + save to file
59+
treemapper -v # show version
60+
```
61+
62+
## Token Counting
63+
64+
Token count and size are always displayed on stderr:
65+
66+
```
67+
12,847 tokens (o200k_base), 52.3 KB
68+
Copied to clipboard
5669
```
5770

71+
For large outputs (>1MB), approximate counts are shown with `~` prefix:
72+
```
73+
~125,000 tokens (o200k_base), 5.2 MB
74+
```
75+
76+
Uses tiktoken with `o200k_base` encoding (GPT-4o tokenizer).
77+
5878
## Clipboard Support
5979

6080
Copy output directly to clipboard with `-c` or `--copy`:
6181

6282
```bash
63-
treemapper . -c # copy to clipboard + stdout
83+
treemapper . -c # copy to clipboard (no stdout)
6484
treemapper . -c -o tree.yaml # copy to clipboard + save to file
65-
treemapper . --copy-only # copy to clipboard only
66-
treemapper . --copy-only -o tree.yaml # copy to clipboard + save to file (no stdout)
6785
```
6886

6987
**System Requirements:**
@@ -72,10 +90,12 @@ treemapper . --copy-only -o tree.yaml # copy to clipboard + save to file (no std
7290
- **Linux/FreeBSD (Wayland):** `wl-copy` (install: `sudo apt install wl-clipboard`)
7391
- **Linux/FreeBSD (X11):** `xclip` or `xsel` (install: `sudo apt install xclip`)
7492

93+
If clipboard is unavailable, output falls back to stdout with a warning on stderr.
94+
7595
## Python API
7696

7797
```python
78-
from treemapper import map_directory, to_yaml, to_json, to_text
98+
from treemapper import map_directory, to_yaml, to_json, to_text, to_markdown
7999

80100
# Full function signature
81101
tree = map_directory(
@@ -95,7 +115,8 @@ tree = map_directory(".", max_file_bytes=50000, ignore_file="custom.ignore")
95115
# Serialize to string
96116
yaml_str = to_yaml(tree)
97117
json_str = to_json(tree)
98-
text_str = to_text(tree)
118+
text_str = to_text(tree) # or to_txt(tree)
119+
md_str = to_markdown(tree) # or to_md(tree)
99120
```
100121

101122
## Ignore Patterns
@@ -111,8 +132,8 @@ Features:
111132
## Content Placeholders
112133

113134
When file content cannot be read normally, placeholders are used:
114-
- `<file too large: N bytes>` — file exceeds `--max-file-bytes` limit
115-
- `<binary file: N bytes>` — file detected as binary (contains null bytes)
135+
- `<file too large: N bytes>` — file exceeds `--max-file-bytes` limit (default: 10 MB)
136+
- `<binary file: N bytes>`binary file (detected by extension or null bytes)
116137
- `<unreadable content: not utf-8>` — file is not valid UTF-8
117138
- `<unreadable content>` — file cannot be read (permission denied, I/O error)
118139

@@ -134,8 +155,9 @@ Integration tests only - test against real filesystem. No mocking.
134155
src/treemapper/
135156
├── cli.py # argument parsing
136157
├── ignore.py # gitignore/treemapperignore handling
158+
├── tokens.py # token counting (tiktoken)
137159
├── tree.py # directory traversal
138-
├── writer.py # YAML/JSON/text output
160+
├── writer.py # YAML/JSON/text/Markdown output
139161
└── treemapper.py # main entry point
140162
```
141163

docs/Token Counting.md

Lines changed: 0 additions & 81 deletions
This file was deleted.

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ files = ["src"]
3737

3838
[tool.commitizen]
3939
name = "cz_conventional_commits"
40-
version_provider = "pep621"
4140
tag_format = "v$version"
41+
version_files = ["src/treemapper/version.py:__version__"]
4242

4343
[project]
4444
name = "treemapper"
@@ -50,7 +50,7 @@ description = "Export codebase structure and contents for AI/LLM context"
5050
readme = "README.md"
5151
requires-python = ">=3.9"
5252
license = { file = "LICENSE" }
53-
keywords = ["code-analysis", "directory-tree", "yaml", "json", "llm", "ai", "codebase", "context", "chatgpt", "claude", "code-context", "export", "tree"]
53+
keywords = ["code-analysis", "directory-tree", "yaml", "json", "llm", "ai", "codebase", "context", "chatgpt", "claude", "code-context", "export", "tree", "gpt-context", "llm-context", "code-to-prompt", "claude-context"]
5454
classifiers = [
5555
"Development Status :: 5 - Production/Stable",
5656
"Environment :: Console",
@@ -70,6 +70,7 @@ classifiers = [
7070
dependencies = [
7171
"pathspec>=0.11,<2.0",
7272
"pyyaml>=6.0.2,<8.0",
73+
"tiktoken>=0.7,<1.0",
7374
]
7475

7576
[project.urls]

src/treemapper/__init__.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,16 @@
55
from .ignore import get_ignore_specs
66
from .tree import TreeBuildContext, build_tree
77
from .version import __version__
8-
from .writer import write_tree_json, write_tree_text, write_tree_yaml
8+
from .writer import write_tree_json, write_tree_markdown, write_tree_text, write_tree_yaml
99

1010
__all__ = [
1111
"__version__",
1212
"map_directory",
1313
"to_json",
14+
"to_markdown",
15+
"to_md",
1416
"to_text",
17+
"to_txt",
1518
"to_yaml",
1619
]
1720

@@ -63,3 +66,13 @@ def to_text(tree: dict[str, Any]) -> str:
6366
buf = io.StringIO()
6467
write_tree_text(buf, tree)
6568
return buf.getvalue()
69+
70+
71+
def to_markdown(tree: dict[str, Any]) -> str:
72+
buf = io.StringIO()
73+
write_tree_markdown(buf, tree)
74+
return buf.getvalue()
75+
76+
77+
to_md = to_markdown
78+
to_txt = to_text

0 commit comments

Comments
 (0)