Skip to content

Latest commit

 

History

History
336 lines (274 loc) · 15.6 KB

File metadata and controls

336 lines (274 loc) · 15.6 KB

Changelog

[1.0.7] - 2026-02-18

Changes

  • LLM: add LiquidAI LFM2-1.2B as an alternative base model for query expansion fine-tuning. LFM2's hybrid architecture (convolutions + attention) is 2x faster at decode/prefill vs standard transformers — good fit for on-device inference.
  • CLI: support multiple -c flags to search across several collections at once (e.g. qmd search -c notes -c journals "query"). #191 (thanks @openclaw)

Fixes

  • Return empty JSON array [] instead of no output when --json search finds no results.
  • Resolve relative paths passed to --index so they don't produce malformed config entries.
  • Respect XDG_CONFIG_HOME for collection config path instead of always using ~/.config. #190 (thanks @openclaw)
  • CLI: empty-collection hint now shows the correct collection add command. #200 (thanks @vincentkoc)

[1.0.6] - 2026-02-16

Changes

  • CLI: qmd status now shows models with full HuggingFace links instead of static names in --help. Model info is derived from the actual configured URIs so it stays accurate if models change.
  • Release tooling: pre-push hook handles non-interactive shells (CI, editors) gracefully — warnings auto-proceed instead of hanging on a tty prompt. Annotated tags now resolve correctly for CI checks.

[1.0.5] - 2026-02-16

The npm package now ships compiled JavaScript instead of raw TypeScript, removing the tsx runtime dependency. A new /release skill automates the full release workflow with changelog validation and git hook enforcement.

Changes

  • Build: compile TypeScript to dist/ via tsc so the npm package no longer requires tsx at runtime. The qmd shell wrapper now runs dist/qmd.js directly.
  • Release tooling: new /release skill that manages the full release lifecycle — validates changelog, installs git hooks, previews release notes, and cuts the release. Auto-populates [Unreleased] from git history when empty.
  • Release tooling: scripts/extract-changelog.sh extracts cumulative notes for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases. Includes [Unreleased] content in previews.
  • Release tooling: scripts/release.sh renames [Unreleased] to a versioned heading and inserts a fresh empty [Unreleased] section automatically.
  • Release tooling: pre-push git hook blocks v* tag pushes unless package.json version matches the tag, a changelog entry exists, and CI passed on GitHub.
  • Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub release with cumulative notes extracted from the changelog, and publishes to npm with provenance.

1.0.0 - 2026-02-15

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking through parallel GPU contexts. GPU auto-detection replaces the unreliable gpu: "auto" with explicit CUDA/Metal/Vulkan probing.

Changes

  • Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite abstraction layer (src/db.ts). bun:sqlite on Bun, better-sqlite3 on Node. The qmd wrapper auto-detects a suitable Node.js install via PATH, then falls back to mise, asdf, nvm, and Homebrew locations.
  • Performance: parallel embedding & reranking via multiple LlamaContext instances — up to 2.7x faster on multi-core machines.
  • Performance: flash attention for ~20% less VRAM per reranking context, enabling more parallel contexts on GPU.
  • Performance: right-sized reranker context (40960 → 2048 tokens, 17x less memory) since chunks are capped at ~900 tokens.
  • Performance: adaptive parallelism — context count computed from available VRAM (GPU) or CPU math cores rather than hardcoded.
  • GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of relying on node-llama-cpp's gpu: "auto". qmd status shows device info.
  • Tests: reorganized into flat test/ directory with vitest for Node.js and bun test for Bun. New eval-bm25 and store.helpers.unit suites.

Fixes

  • Prevent VRAM waste from duplicate context creation during concurrent embedBatch calls — initialization lock now covers the full path.
  • Collection-aware FTS filtering so scoped keyword search actually restricts results to the requested collection.

0.9.0 - 2026-02-15

First published release on npm as @tobilu/qmd. MCP HTTP transport with daemon mode cuts warm query latency from ~16s to ~10s by keeping models loaded between requests.

Changes

  • MCP: HTTP transport with daemon lifecycle — qmd mcp --http --daemon starts a background server, qmd mcp stop shuts it down. Models stay warm in VRAM between queries. #149 (thanks @igrigorik)
  • Search: type-routed query expansion preserves lex/vec/hyde type info and routes to the appropriate backend. Eliminates ~4 wasted backend calls per query (10.0 → 6.0 calls, 1278ms → 549ms). #149 (thanks @igrigorik)
  • Search: unified pipeline — extracted hybridQuery() and vectorSearchQuery() to store.ts so CLI and MCP share identical logic. Fixes a class of bugs where results differed between the two. #149 (thanks @igrigorik)
  • MCP: dynamic instructions generated at startup from actual index state — LLMs see collection names, doc counts, and content descriptions. #149 (thanks @igrigorik)
  • MCP: tool renames (vsearch → vector_search, query → deep_search) with rewritten descriptions for better tool selection. #149 (thanks @igrigorik)
  • Integration: Claude Code plugin with inline status checks and MCP integration. #99 (thanks @galligan)

Fixes

  • BM25 score normalization — formula was inverted (1/(1+|x|) instead of |x|/(1+|x|)), so strong matches scored lowest. Broke --min-score filtering and made the "strong signal" short-circuit dead code. #76 (thanks @dgilperez)
  • Normalize Unicode paths to NFC for macOS compatibility. #82 (thanks @c-stoeckl)
  • Handle dense content (code) that tokenizes beyond expected chunk size.
  • Proper cleanup of Metal GPU resources on process exit.
  • SQLite-vec readiness verification after extension load.
  • Reactivate deactivated documents on re-index instead of creating duplicates.
  • Bun UTF-8 path corruption workaround for non-ASCII filenames.
  • Disable following symlinks in glob.scan to avoid infinite loops.

[0.8.0] - 2026-01-28

Fine-tuned query expansion model trained with GRPO replaces the stock Qwen3 0.6B. The training pipeline scores expansions on named entity preservation, format compliance, and diversity — producing noticeably better lexical variations and HyDE documents.

Changes

  • LLM: deploy GRPO-trained (Group Relative Policy Optimization) query expansion model, hosted on HuggingFace and auto-downloaded on first use. Better preservation of proper nouns and technical terms in expansions.
  • LLM: /only:lex mode for single-type expansions — useful when you know which search backend will help.
  • LLM: HyDE output moved to first position so vector search can start embedding while other expansions generate.
  • LLM: session lifecycle management via withLLMSession() pattern — ensures cleanup even on failure, similar to database transactions.
  • Integration: org-mode title extraction support. #50 (thanks @sh54)
  • Integration: SQLite extension loading in Nix devshell. #48 (thanks @sh54)
  • Integration: AI agent discovery via skills.sh. #64 (thanks @Algiras)

Fixes

  • Use sequential embedding on CPU-only systems — parallel contexts caused a race condition where contexts competed for CPU cores, making things slower. #54 (thanks @freeman-jiang)
  • Fix collectionName column in vector search SQL (was still using old collectionId from before YAML migration). #61 (thanks @jdvmi00)
  • Fix Qwen3 sampling params to prevent repetition loops — stock temperature/top-p caused occasional infinite repeat patterns.
  • Add --index option to CLI argument parser (was documented but not wired up). #84 (thanks @Tritlo)
  • Fix DisposedError during slow batch embedding. #41 (thanks @wuhup)

[0.7.0] - 2026-01-09

First community contributions. The project gained external contributors, surfacing bugs that only appear in diverse environments — Homebrew sqlite-vec paths, case-sensitive model filenames, and sqlite-vec JOIN incompatibilities.

Changes

  • Indexing: native realpathSync() replaces readlink -f subprocess spawn per file. On a 5000-file collection this eliminates 5000 shell spawns, ~15% faster. #8 (thanks @burke)
  • Indexing: single-pass tokenization — chunking algorithm tokenized each document twice (count then split); now tokenizes once and reuses. #9 (thanks @burke)

Fixes

  • Fix vsearch and query hanging — sqlite-vec's virtual table doesn't support the JOIN pattern used; rewrote to subquery. #23 (thanks @mbrendan)
  • Fix MCP server exiting immediately after startup — process had no active handles keeping the event loop alive. #29 (thanks @mostlydev)
  • Fix collection filter SQL to properly restrict vector search results.
  • Support non-ASCII filenames in collection filter.
  • Skip empty files during indexing instead of crashing on zero-length content.
  • Fix case sensitivity in Qwen3 model filename resolution. #15 (thanks @gavrix)
  • Fix sqlite-vec loading on macOS with Homebrew (BREW_PREFIX detection). #42 (thanks @komsit37)
  • Fix Nix flake to use correct src/qmd.ts path. #7 (thanks @burke)
  • Fix docid lookup with quotes support in get command. #36 (thanks @JoshuaLelon)
  • Fix query expansion model size in documentation. #38 (thanks @odysseus0)

[0.6.0] - 2025-12-28

Replaced Ollama HTTP API with node-llama-cpp for all LLM operations. Ollama adds convenience but also a running server dependency. node-llama-cpp loads GGUF models directly in-process — zero external dependencies. Models auto-download from HuggingFace on first use.

Changes

  • LLM: structured query expansion via JSON schema grammar constraints. Model produces typed expansions — lexical (BM25 keywords), vector (semantic rephrasings), HyDE (hypothetical document excerpts) — so each routes to the right backend instead of sending everything everywhere.
  • LLM: lazy model loading with 2-minute inactivity auto-unload. Keeps memory low when idle while avoiding ~3s model load on every query.
  • Search: conditional query expansion — when BM25 returns strong results, the expensive LLM expansion is skipped entirely.
  • Search: multi-chunk reranking — documents with multiple relevant chunks scored by aggregating across all chunks rather than best single chunk.
  • Search: cosine distance for vector search (was L2).
  • Search: embeddinggemma nomic-style prompt formatting.
  • Testing: evaluation harness with synthetic test documents and Hit@K metrics for BM25, vector, and hybrid RRF.

[0.5.0] - 2025-12-13

Collections and contexts moved from SQLite tables to YAML at ~/.config/qmd/index.yml. SQLite was overkill for config — you can't share it, and it's opaque. YAML is human-readable and version-controllable. The migration was extensive (35+ commits) because every part of the system that touched collections or contexts had to be updated.

Changes

  • Config: YAML-based collections and contexts replace SQLite tables. collections and path_contexts tables dropped from schema. Collections support an optional update: command (e.g., git pull) before re-index.
  • CLI: qmd collection add/list/remove/rename commands with --name and --mask glob pattern support.
  • CLI: qmd ls virtual file tree — list collections, files in a collection, or files under a path prefix.
  • CLI: qmd context add/list/check/rm with hierarchical context inheritance. A query to qmd://notes/2024/jan/ inherits context from notes/, notes/2024/, and notes/2024/jan/.
  • CLI: qmd context add / "text" for global context across all collections.
  • CLI: qmd context check audit command to find paths without context.
  • Paths: qmd:// virtual URI scheme for portable document references. qmd://notes/ideas.md works regardless of where the collection lives on disk. Works in get, multi-get, ls, and context commands.
  • CLI: document IDs (docid) — first 6 chars of content hash for stable references. Shown as #abc123 in search results, usable with get and multi-get.
  • CLI: --line-numbers flag for get command output.

[0.4.0] - 2025-12-10

MCP server for AI agent integration. Without it, agents had to shell out to qmd search and parse CLI output. The monolithic qmd.ts (1840 lines) was split into focused modules with the project's first test suite (215 tests).

Changes

  • MCP: stdio server with tools for search, vector search, hybrid query, document retrieval, and status. Runs over stdio transport for Claude Desktop and MCP clients.
  • MCP: spec-compliant with June 2025 MCP specification — removed non-spec mimeType, added isError: true to errors, structuredContent for machine-readable results, proper URI encoding.
  • MCP: simplified tool naming (qmd_searchsearch) since MCP already namespaces by server.
  • Architecture: extract store.ts (1221 LOC), llm.ts (539 LOC), formatter.ts (359 LOC), mcp.ts (503 LOC) from monolithic qmd.ts.
  • Testing: 215 tests (store: 96, llm: 60, mcp: 59) with mocked Ollama for fast, deterministic runs. Before this: zero tests.

[0.3.0] - 2025-12-08

Document chunking for vector search. A 5000-word document about many topics gets a single embedding that averages everything together, matching poorly for specific queries. Chunking produces one embedding per ~900-token section with focused semantic signal.

Changes

  • Search: markdown-aware chunking — prefers heading boundaries, then paragraph breaks, then sentence boundaries. 15% overlap between chunks ensures cross-boundary queries still match.
  • Search: multi-chunk scoring bonus (+0.02 per additional chunk, capped at +0.1 for 5+ chunks). Documents relevant in multiple sections rank higher.
  • CLI: display paths show collection-relative paths and extracted titles (from H1 headings or YAML frontmatter) instead of raw filesystem paths.
  • CLI: --all flag returns all matches (use with --min-score to filter).
  • CLI: byte-based progress bar with ETA for embed command.
  • CLI: human-readable time formatting ("15m 4s" instead of "904.2s").
  • CLI: documents >64KB truncated with warning during embedding.

[0.2.0] - 2025-12-08

Changes

  • CLI: --json, --csv, --files, --md, --xml output format flags. --json for programmatic access, --files for piping, --md/--xml for LLM consumption, --csv for spreadsheets.
  • CLI: qmd status shows index health — document count, size, embedding coverage, time since last update.
  • Search: weighted RRF — original query gets 2x weight relative to expanded queries since the user's actual words are a more reliable signal.

[0.1.0] - 2025-12-07

Initial implementation. Built in a single day for searching personal markdown notes, journals, and meeting transcripts.

Changes

  • Search: SQLite FTS5 with BM25 ranking. Chose SQLite over Elasticsearch because QMD is a personal tool — single binary, no server dependencies.
  • Search: sqlite-vec for vector similarity. Same rationale: in-process, no external vector database.
  • Search: Reciprocal Rank Fusion to combine BM25 and vector results. RRF is parameter-free and handles missing signals gracefully.
  • LLM: Ollama for embeddings, reranking, and query expansion. Later replaced with node-llama-cpp in 0.6.0.
  • CLI: qmd add, qmd embed, qmd search, qmd vsearch, qmd query, qmd get. ~1800 lines of TypeScript in a single qmd.ts file.