|
| 1 | +## Plan: Improved Markdown (`md`) output format |
| 2 | + |
| 3 | +Add `md` as a first-class output format that produces a single, readable Markdown document with: |
| 4 | + |
| 5 | +* **dynamic `#` headings** based on directory depth |
| 6 | +* **directories distinguished with trailing `/`** |
| 7 | +* **file contents in fenced blocks** using inferred languages (for syntax highlighting) |
| 8 | +* **robust handling** of deep nesting, placeholder content, and backticks inside files |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## Goals |
| 13 | + |
| 14 | +1. Generate a Markdown document that’s pleasant in GitHub/Markdown viewers *and* copy-pastable into LLMs. |
| 15 | +2. Preserve file content **byte-for-byte (as text)** (no transformations beyond ensuring it ends with `\n` already done in `tree.py`). |
| 16 | +3. Avoid broken Markdown in real-world repos (especially due to backticks in code). |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## Output specification |
| 21 | + |
| 22 | +### 1) Headings & hierarchy |
| 23 | + |
| 24 | +* Root directory node: |
| 25 | + |
| 26 | + * `# <root>/` |
| 27 | +* Each child directory/file: |
| 28 | + |
| 29 | + * heading level = `depth + 1` (root depth = 0 → `#`) |
| 30 | +* Max heading level: |
| 31 | + |
| 32 | + * Use headings only up to `######` (depth ≤ 5) |
| 33 | + * Beyond that: switch to **indented bullet + bold name** to preserve structure |
| 34 | + |
| 35 | +Example deep nesting rendering (depth > 5): |
| 36 | + |
| 37 | +```md |
| 38 | +###### l5/ |
| 39 | + - **l6/** |
| 40 | + - **l7/** |
| 41 | +``` |
| 42 | + |
| 43 | +### 2) Directory vs file names |
| 44 | + |
| 45 | +* Directories: `name/` |
| 46 | +* Files: `name` (no slash) |
| 47 | + |
| 48 | +### 3) File content blocks |
| 49 | + |
| 50 | +* If a node has `"content"`: |
| 51 | + |
| 52 | + * Emit fenced block with language: <code>`python</code>, <code>`toml</code>, etc. |
| 53 | + * If language unknown → plain fence (no language) |
| 54 | + |
| 55 | +### 4) Placeholder content (binary / unreadable / too large) |
| 56 | + |
| 57 | +TreeMapper already emits placeholders like: |
| 58 | + |
| 59 | +* `<file too large: N bytes>` |
| 60 | +* `<binary file: N bytes>` |
| 61 | +* `<unreadable content: not utf-8>` |
| 62 | +* `<unreadable content>` |
| 63 | + |
| 64 | +For these, emit **italic inline** (no code fence), e.g.: |
| 65 | + |
| 66 | +```md |
| 67 | +_<binary file: 2048 bytes>_ |
| 68 | +``` |
| 69 | + |
| 70 | +### 5) Backticks inside content (critical robustness fix) |
| 71 | + |
| 72 | +If file content contains triple backticks, a normal fence breaks Markdown. |
| 73 | + |
| 74 | +Solution: |
| 75 | + |
| 76 | +* Choose a fence delimiter longer than any run of backticks in the content. |
| 77 | +* Example: if content contains `, use ` as fence. |
| 78 | + |
| 79 | +This guarantees the Markdown remains valid while keeping file content unchanged. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +## Implementation |
| 84 | + |
| 85 | +### A) CLI: add `md` format |
| 86 | + |
| 87 | +File: `src/treemapper/cli.py` |
| 88 | + |
| 89 | +* Extend `--format` choices to `["yaml", "json", "text", "md"]` |
| 90 | + |
| 91 | +Optionally (nice usability): |
| 92 | + |
| 93 | +* support alias `"markdown"` too (mapped to `"md"` internally), but **not required** for v1. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +### B) Writer: add Markdown serializer |
| 98 | + |
| 99 | +File: `src/treemapper/writer.py` |
| 100 | + |
| 101 | +Add: |
| 102 | + |
| 103 | +1. **Language mapping** |
| 104 | + |
| 105 | +* Use the richer mapping you provided (`EXTENSION_TO_LANG`, `FILENAME_TO_LANG`) |
| 106 | +* Keep `Path(filename).suffix.lower()` for extension lookup |
| 107 | +* Use `filename.lower()` for filename lookup |
| 108 | + |
| 109 | +2. **Fence length selection** |
| 110 | + |
| 111 | +* Find longest run of backticks in content and pick a longer fence |
| 112 | +* Always at least 3 backticks |
| 113 | + |
| 114 | +3. **Placeholder detection** |
| 115 | + Avoid false positives (e.g., real HTML file that starts with `<tag>`). |
| 116 | + Implement a strict check against known placeholder patterns TreeMapper produces. |
| 117 | + |
| 118 | +Recommended: |
| 119 | + |
| 120 | +* `content_stripped = content.strip()` |
| 121 | +* return `True` if: |
| 122 | + |
| 123 | + * `content_stripped == "<unreadable content>"` |
| 124 | + * `content_stripped == "<unreadable content: not utf-8>"` |
| 125 | + * `content_stripped.startswith("<binary file:") and content_stripped.endswith(">")` |
| 126 | + * `content_stripped.startswith("<file too large:") and content_stripped.endswith(">")` |
| 127 | + |
| 128 | +4. **Deep nesting formatting** |
| 129 | + |
| 130 | +* For depth ≤ 5: headings `#`…`######` |
| 131 | +* For depth ≥ 6: use indentation + bullet + bold, preserving hierarchy |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +### C) Writer dispatcher: route `md` |
| 136 | + |
| 137 | +File: `src/treemapper/writer.py` |
| 138 | +Update `write_tree_to_file()` dispatcher: |
| 139 | + |
| 140 | +* `elif output_format == "md": write_tree_markdown(f, tree)` |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +### D) Public API: `to_markdown` |
| 145 | + |
| 146 | +File: `src/treemapper/__init__.py` |
| 147 | + |
| 148 | +* Import `write_tree_markdown` |
| 149 | +* Add `to_markdown(tree) -> str` |
| 150 | +* Add to `__all__` |
| 151 | + |
| 152 | +Optional convenience (safe): |
| 153 | + |
| 154 | +* add `to_md = to_markdown` alias (doesn’t break anything, just adds a shorter name) |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +## Proposed function behavior (pseudocode structure) |
| 159 | + |
| 160 | +* `write_tree_markdown(file, tree)`: |
| 161 | + |
| 162 | + * recursive `walk(node, depth)` |
| 163 | + * if directory: |
| 164 | + |
| 165 | + * emit heading (or bullet+bold if deep) |
| 166 | + * recurse children |
| 167 | + * else file: |
| 168 | + |
| 169 | + * emit heading (or bullet+bold if deep) |
| 170 | + * if content missing: just spacing |
| 171 | + * else if placeholder: italic line |
| 172 | + * else: |
| 173 | + |
| 174 | + * fence = longest_backtick_run+1 (min 3) |
| 175 | + * lang = inferred |
| 176 | + * emit fence + lang + content + closing fence |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## Documentation updates |
| 181 | + |
| 182 | +* `CLAUDE.md` (and/or README): |
| 183 | + |
| 184 | + * add examples: |
| 185 | + |
| 186 | + * `treemapper . --format md` |
| 187 | + * `treemapper . --format md -o context.md` |
| 188 | + * briefly explain headings + fenced code blocks |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## Testing plan |
| 193 | + |
| 194 | +### Unit tests (add to existing test suite) |
| 195 | + |
| 196 | +1. **Basic structure** |
| 197 | + |
| 198 | +* root heading `# root/` |
| 199 | +* nested dir headings increment |
| 200 | +* file headings render at correct depth |
| 201 | + |
| 202 | +2. **Language detection** |
| 203 | + |
| 204 | +* `.py` → `python` |
| 205 | +* `Makefile` → `makefile` |
| 206 | +* `.yml` → `yaml` |
| 207 | +* unknown ext → empty language |
| 208 | + |
| 209 | +3. **Placeholder formatting** |
| 210 | + |
| 211 | +* `<binary file: ...>` results in italic and **no code fence** for that file |
| 212 | + |
| 213 | +4. **Deep nesting** |
| 214 | + |
| 215 | +* depth 5 uses `######` |
| 216 | +* depth 6 uses bullet+bold with indentation |
| 217 | + |
| 218 | +5. **Backtick safety** |
| 219 | + |
| 220 | +* file content contains ``` inside |
| 221 | +* output uses a longer fence (e.g. ````) and remains balanced |
| 222 | + |
| 223 | +--- |
| 224 | + |
| 225 | +## Verification commands |
| 226 | + |
| 227 | +```bash |
| 228 | +treemapper . --format md --max-depth 2 |
| 229 | +treemapper . --format md -o codebase.md |
| 230 | +treemapper . --format md --no-content |
| 231 | +``` |
| 232 | + |
| 233 | +--- |
| 234 | + |
| 235 | +## Notes on compatibility with your current codebase |
| 236 | + |
| 237 | +* Fits your existing architecture cleanly: |
| 238 | + |
| 239 | + * CLI just adds a choice |
| 240 | + * writer gains a new serializer and dispatcher branch |
| 241 | + * public API mirrors existing `to_yaml/to_json/to_text` |
| 242 | +* No new dependencies, no changes to tree-building logic, and no changes required to ignore handling. |
| 243 | + |
| 244 | +If you want, I can also produce a patch-style diff for the exact files (`cli.py`, `writer.py`, `__init__.py`) that matches your current structure and naming conventions. |
0 commit comments