You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(docs): update CLAUDE.md and README.md to reflect support for 30+ document formats and improve clarity on features
feat(docs): enhance documentation in DEVELOPMENT.md and kbignore.md for better understanding of file management and indexing
feat(docs): add detailed extraction methods in extract.py for various document formats
feat(ingest): refactor index_directory to support indexing across multiple document formats and implement file size checks
feat(cli): introduce new commands for allowing large files and listing indexed documents in cli.py
fix(config): add configuration options for indexing code files and managing file size limits in config.py
test: update tests to reflect changes in indexing and extraction functionalities, ensuring compatibility with new features
CLI RAG tool for your docs. Index markdown + PDFs, hybrid search (semantic + keyword), ask questions and get sourced answers. Built on [sqlite-vec](https://github.com/asg017/sqlite-vec).
6
+
CLI RAG tool for your docs. Index 30+ document formats (markdown, PDF, DOCX, EPUB, HTML, ODT, RTF, plain text, email, and more), hybrid search (semantic + keyword), ask questions and get sourced answers. Built on [sqlite-vec](https://github.com/asg017/sqlite-vec).
7
7
8
8
## Features
9
9
@@ -12,7 +12,8 @@ CLI RAG tool for your docs. Index markdown + PDFs, hybrid search (semantic + key
12
12
-**Incremental indexing** — content-hash per chunk, only re-embeds changes
13
13
-**LLM rerank** — `ask` over-fetches candidates, LLM ranks by relevance, keeps the best
14
14
-**Pre-search filters** — file globs, date ranges, keyword inclusion/exclusion
15
-
-**PDF support** — install with `kb[pdf]` or `kb[all]`
| Office |`.docx`, `.pptx`, `.xlsx`|`kb[office]` or `kb[all]`|
180
+
| RTF |`.rtf`|`kb[rtf]` or `kb[all]`|
181
+
182
+
**Code files (opt-in):** Set `index_code = true` in config to also index source code — `.py`, `.js`, `.ts`, `.go`, `.rs`, `.java`, `.c`, `.cpp`, and 60+ more extensions.
183
+
184
+
Run `kb stats` to see which formats are available in your installation.
|[Khoj](https://github.com/khoj-ai/khoj)| Self-hosted AI second brain with web UI, mobile, Obsidian/Emacs plugins | Optional | No | Docker or pip, runs a web server |
183
220
|[Reor](https://github.com/reorproject/reor)| Desktop note-taking app with auto-linking and local LLM | Yes | No | Electron app, uses LanceDB + Ollama |
184
221
|[LlamaIndex](https://github.com/run-llama/llama_index)| Framework for building RAG pipelines | Depends | No | Python library, you build the app |
@@ -187,7 +224,7 @@ kb ask "question"
187
224
188
225
**When to use what:**
189
226
190
-
-**kb** — you want a CLI RAG tool that indexes docs (markdown, PDFs) and answers questions from them
227
+
-**kb** — you want a CLI RAG tool that indexes docs (markdown, PDFs, DOCX, EPUB, HTML, and more) and answers questions from them
191
228
-**grepai** — you want semantic search over code (find by intent, trace call graphs), no RAG
192
229
-**Khoj** — you want a full-featured app with web UI, phone access, Obsidian integration, and agent capabilities
193
230
-**Reor** — you want an Obsidian-like desktop editor that auto-links notes using local AI
0 commit comments