Commit b81d55f
feat(B2): add Microsoft Word (.docx) support
Implements ROADMAP task B2 — full .docx scraping support via mammoth +
python-docx, producing SKILL.md + references/ output identical to other
source types.
New files:
- src/skill_seekers/cli/word_scraper.py — WordToSkillConverter class +
main() entry point (~600 lines); mammoth → BeautifulSoup pipeline;
handles headings, code detection (incl. monospace <p><br> blocks),
tables, images, metadata extraction
- src/skill_seekers/cli/arguments/word.py — add_word_arguments() +
WORD_ARGUMENTS dict
- src/skill_seekers/cli/parsers/word_parser.py — WordParser for unified
CLI parser registry
- tests/test_word_scraper.py — comprehensive test suite (~300 lines)
Modified files:
- src/skill_seekers/cli/main.py — registered "word" command module
- src/skill_seekers/cli/source_detector.py — .docx auto-detection +
_detect_word() classmethod
- src/skill_seekers/cli/create_command.py — _route_word() + --help-word
- src/skill_seekers/cli/arguments/create.py — WORD_ARGUMENTS + routing
- src/skill_seekers/cli/arguments/__init__.py — export word args
- src/skill_seekers/cli/parsers/__init__.py — register WordParser
- src/skill_seekers/cli/unified_scraper.py — _scrape_word() integration
- src/skill_seekers/cli/pdf_scraper.py — fix: real enhancement instead
of stub; remove [:3] reference file limit; capture run_workflows return
- src/skill_seekers/cli/github_scraper.py — fix: remove arbitrary
open_issues[:20] / closed_issues[:10] reference file limits
- pyproject.toml — skill-seekers-word entry point + docx optional dep
- tests/test_cli_parsers.py — update parser count 21→22
Bug fixes applied during real-world testing:
- Code detection: detect monospace <p><br> blocks as code (mammoth
renders Courier paragraphs this way, not as <pre>/<code>)
- Language detector: fix wrong method name detect_from_text →
detect_from_code
- Description inference: pass None from main() so extract_docx() can
infer description from Word document subject/title metadata
- Bullet-point guard: exclude prose starting with •/-/* from code scoring
- Enhancement: implement real API/LOCAL enhancement (was stub)
- pip install message: add quotes around skill-seekers[docx]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent e42aade commit b81d55f
File tree
17 files changed
+2215
-68
lines changed- src/skill_seekers/cli
- arguments
- parsers
- tests
17 files changed
+2215
-68
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
58 | | - | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
| |||
70 | 72 | | |
71 | 73 | | |
72 | 74 | | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | 75 | | |
80 | 76 | | |
81 | 77 | | |
| |||
85 | 81 | | |
86 | 82 | | |
87 | 83 | | |
| 84 | + | |
88 | 85 | | |
89 | 86 | | |
90 | 87 | | |
| |||
101 | 98 | | |
102 | 99 | | |
103 | 100 | | |
104 | | - | |
105 | 101 | | |
| 102 | + | |
106 | 103 | | |
107 | 104 | | |
108 | 105 | | |
| |||
124 | 121 | | |
125 | 122 | | |
126 | 123 | | |
127 | | - | |
| 124 | + | |
128 | 125 | | |
129 | 126 | | |
130 | 127 | | |
| |||
134 | 131 | | |
135 | 132 | | |
136 | 133 | | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | 134 | | |
149 | 135 | | |
150 | 136 | | |
| |||
259 | 245 | | |
260 | 246 | | |
261 | 247 | | |
262 | | - | |
| 248 | + | |
263 | 249 | | |
264 | 250 | | |
265 | 251 | | |
| |||
316 | 302 | | |
317 | 303 | | |
318 | 304 | | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
335 | 318 | | |
336 | 319 | | |
337 | 320 | | |
| |||
662 | 645 | | |
663 | 646 | | |
664 | 647 | | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
673 | | - | |
674 | | - | |
675 | | - | |
676 | 648 | | |
677 | 649 | | |
678 | 650 | | |
| |||
700 | 672 | | |
701 | 673 | | |
702 | 674 | | |
| 675 | + | |
| 676 | + | |
703 | 677 | | |
704 | 678 | | |
705 | 679 | | |
| |||
852 | 826 | | |
853 | 827 | | |
854 | 828 | | |
855 | | - | |
| 829 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
112 | 118 | | |
113 | 119 | | |
114 | 120 | | |
| |||
146 | 152 | | |
147 | 153 | | |
148 | 154 | | |
| 155 | + | |
| 156 | + | |
149 | 157 | | |
150 | 158 | | |
151 | 159 | | |
| |||
186 | 194 | | |
187 | 195 | | |
188 | 196 | | |
| 197 | + | |
189 | 198 | | |
190 | 199 | | |
191 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
| 48 | + | |
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
389 | 389 | | |
390 | 390 | | |
391 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
392 | 404 | | |
393 | 405 | | |
394 | 406 | | |
| |||
471 | 483 | | |
472 | 484 | | |
473 | 485 | | |
| 486 | + | |
474 | 487 | | |
475 | 488 | | |
476 | 489 | | |
| |||
507 | 520 | | |
508 | 521 | | |
509 | 522 | | |
| 523 | + | |
510 | 524 | | |
511 | 525 | | |
512 | 526 | | |
513 | 527 | | |
514 | 528 | | |
515 | | - | |
| 529 | + | |
516 | 530 | | |
517 | 531 | | |
518 | 532 | | |
| |||
543 | 557 | | |
544 | 558 | | |
545 | 559 | | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
546 | 564 | | |
547 | 565 | | |
548 | 566 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
0 commit comments