Skip to content

Releases: tobi/qmd

v1.1.5

08 Mar 01:37
v1.1.5
4fa1168

Choose a tag to compare

[1.1.5] - 2026-03-07

Ambiguous queries like "performance" now produce dramatically better results
when the caller knows what they mean. The new intent parameter steers all
five pipeline stages — expansion, strong-signal bypass, chunk selection,
reranking, and snippet extraction — without searching on its own. Design and
original implementation by Ilya Grigorik (@vyalamar) in #180.

Changes

  • Intent parameter: optional intent string disambiguates queries across
    the entire search pipeline. Available via CLI (--intent flag or intent:
    line in query documents), MCP (intent field on the query tool), and
    programmatic API. Adapted from PR #180 (thanks @vyalamar).
  • Query expansion: when intent is provided, the expansion LLM prompt
    includes Query intent: {intent}, matching the finetune training data
    format for better-aligned expansions.
  • Reranking: intent is prepended to the rerank query so Qwen3-Reranker
    scores with domain context.
  • Chunk selection: intent terms scored at 0.5× weight alongside query
    terms (1.0×) when selecting the best chunk per document for reranking.
  • Snippet extraction: intent terms scored at 0.3× weight to nudge
    snippets toward intent-relevant lines without overriding query anchoring.
  • Strong-signal bypass disabled with intent: when intent is provided, the
    BM25 strong-signal shortcut is skipped — the obvious keyword match may not
    be what the caller wants.
  • MCP instructions: callers are now guided to provide intent on every
    search call for disambiguation.
  • Query document syntax: intent: recognized as a line type. At most one
    per document, cannot appear alone. Grammar updated in docs/SYNTAX.md.

[1.1.2] - 2026-03-07

13 community PRs merged. GPU initialization replaced with node-llama-cpp's
built-in autoAttempt — deleting ~220 lines of manual fallback code and
fixing GPU issues reported across 10+ PRs in one shot. Reranking is faster
through chunk deduplication and a parallelism cap that prevents VRAM
exhaustion.

Changes

  • GPU init: use node-llama-cpp's build: "autoAttempt" instead of manual
    GPU backend detection. Automatically tries Metal/CUDA/Vulkan and falls back
    gracefully. #310 (thanks @giladgd — the node-llama-cpp author)
  • Query --explain: qmd query --explain exposes retrieval score traces
    — backend scores, per-list RRF contributions, top-rank bonus, reranker
    score, and final blended score. Works in JSON and CLI output. #242
    (thanks @vyalamar)
  • Collection ignore patterns: ignore: ["Sessions/**", "*.tmp"] in
    collection config to exclude files from indexing. #304 (thanks @sebkouba)
  • Multilingual embeddings: QMD_EMBED_MODEL env var lets you swap in
    models like Qwen3-Embedding for non-English collections. #273 (thanks
    @daocoding)
  • Configurable expansion context: QMD_EXPAND_CONTEXT_SIZE env var
    (default 2048) — previously used the model's full 40960-token window,
    wasting VRAM. #313 (thanks @0xble)
  • candidateLimit exposed: -C / --candidate-limit flag and MCP
    parameter to tune how many candidates reach the reranker. #255 (thanks
    @pandysp)
  • MCP multi-session: HTTP transport now supports multiple concurrent
    client sessions, each with its own server instance. #286 (thanks @joelev)

Fixes

  • Reranking performance: cap parallel rerank contexts at 4 to prevent
    VRAM exhaustion on high-core machines. Deduplicate identical chunk texts
    before reranking — same content from different files now shares a single
    reranker call. Cache scores by content hash instead of file path.
  • Deactivate stale docs when all files are removed from a collection and
    qmd update is run. #312 (thanks @0xble)
  • Handle emoji-only filenames (🐘.md1f418.md) instead of crashing.
    #308 (thanks @debugerman)
  • Skip unreadable files during indexing (e.g. iCloud-evicted files returning
    EAGAIN) instead of crashing. #253 (thanks @jimmynail)
  • Suppress progress bar escape sequences when stderr is not a TTY. #230
    (thanks @dgilperez)
  • Emit format-appropriate empty output ([] for JSON, CSV header for CSV,
    etc.) instead of plain text "No results." #228 (thanks @amsminn)
  • Correct Windows sqlite-vec package name (sqlite-vec-windows-x64) and add
    sqlite-vec-linux-arm64. #225 (thanks @ilepn)
  • Fix claude plugin setup CLI commands in README. #311 (thanks @gi11es)

[1.1.1] - 2026-03-06

Fixes

  • Reranker: truncate documents exceeding the 2048-token context window
    instead of silently producing garbage scores. Long chunks (e.g. from
    PDF ingestion) now get a fair ranking.
  • Nix: add python3 and cctools to build dependencies. #214 (thanks
    @pcasaretto)

[1.1.0] - 2026-02-20

QMD now speaks in query documents — structured multi-line queries where every line is typed (lex:, vec:, hyde:), combining keyword precision with semantic recall. A single plain query still works exactly as before (it's treated as an implicit expand: and auto-expanded by the LLM). Lex now supports quoted phrases and negation ("C++ performance" -sports -athlete), making intent-aware disambiguation practical. The formal query grammar is documented in docs/SYNTAX.md.

The npm package now uses the standard #!/usr/bin/env node bin convention, replacing the custom bash wrapper. This fixes native module ABI mismatches when installed via bun and works on any platform with node >= 22 on PATH.

Changes

  • Query document format: multi-line queries with typed sub-queries (lex:, vec:, hyde:). Plain queries remain the default (expand: implicit, but not written inside the document). First sub-query gets 2× fusion weight — put your strongest signal first. Formal grammar in docs/SYNTAX.md.
  • Lex syntax: full BM25 operator support. "exact phrase" for verbatim matching; -term and -"phrase" for exclusions. Essential for disambiguation when a term is overloaded across domains (e.g. performance -sports -athlete).
  • expand: shortcut: send a single plain query (or start the document with expand: on its only line) to auto-expand via the local LLM. Query documents themselves are limited to lex, vec, and hyde lines.
  • MCP query tool (renamed from structured_search): rewrote the tool description to fully teach AI agents the query document format, lex syntax, and combination strategy. Includes worked examples with intent-aware lex.
  • HTTP /query endpoint (renamed from /search; /search kept as silent alias).
  • collections array filter: filter by multiple collections in a single query (collections: ["notes", "brain"]). Removed the single collection string param — array only.
  • Collection include/exclude: includeByDefault: false hides a collection from all queries unless explicitly named via collections. CLI: qmd collection exclude <name> / qmd collection include <name>.
  • Collection update-cmd: attach a shell command that runs before every qmd update (e.g. git stash && git pull --rebase --ff-only && git stash pop). CLI: qmd collection update-cmd <name> '<cmd>'.
  • qmd status tips: shows actionable tips when collections lack context descriptions or update commands.
  • qmd collection subcommands: show, update-cmd, include, exclude. Bare qmd collection now prints help.
  • Packaging: replaced custom bash wrapper with standard #!/usr/bin/env node shebang on dist/qmd.js. Fixes native module ABI mismatches when installed via bun, and works on any platform where node >= 22 is on PATH.
  • Removed MCP tools search, vector_search, deep_search — all superseded by query.
  • Removed qmd context check command.
  • CLI timing: each LLM step (expand, embed, rerank) prints elapsed time inline (Expanding query... (4.2s)).

Fixes

  • qmd collection list shows [excluded] tag for collections with includeByDefault: false.
  • Default searches now respect includeByDefault — excluded collections are skipped unless explicitly named.
  • Fix main module detection when installed globally via npm/bun (symlink resolution).

v1.1.2

07 Mar 20:01
v1.1.2
b838f74

Choose a tag to compare

[1.1.2] - 2026-03-07

13 community PRs merged. GPU initialization replaced with node-llama-cpp's
built-in autoAttempt — deleting ~220 lines of manual fallback code and
fixing GPU issues reported across 10+ PRs in one shot. Reranking is faster
through chunk deduplication and a parallelism cap that prevents VRAM
exhaustion.

Changes

  • GPU init: use node-llama-cpp's build: "autoAttempt" instead of manual
    GPU backend detection. Automatically tries Metal/CUDA/Vulkan and falls back
    gracefully. #310 (thanks @giladgd — the node-llama-cpp author)
  • Query --explain: qmd query --explain exposes retrieval score traces
    — backend scores, per-list RRF contributions, top-rank bonus, reranker
    score, and final blended score. Works in JSON and CLI output. #242
    (thanks @vyalamar)
  • Collection ignore patterns: ignore: ["Sessions/**", "*.tmp"] in
    collection config to exclude files from indexing. #304 (thanks @sebkouba)
  • Multilingual embeddings: QMD_EMBED_MODEL env var lets you swap in
    models like Qwen3-Embedding for non-English collections. #273 (thanks
    @daocoding)
  • Configurable expansion context: QMD_EXPAND_CONTEXT_SIZE env var
    (default 2048) — previously used the model's full 40960-token window,
    wasting VRAM. #313 (thanks @0xble)
  • candidateLimit exposed: -C / --candidate-limit flag and MCP
    parameter to tune how many candidates reach the reranker. #255 (thanks
    @pandysp)
  • MCP multi-session: HTTP transport now supports multiple concurrent
    client sessions, each with its own server instance. #286 (thanks @joelev)

Fixes

  • Reranking performance: cap parallel rerank contexts at 4 to prevent
    VRAM exhaustion on high-core machines. Deduplicate identical chunk texts
    before reranking — same content from different files now shares a single
    reranker call. Cache scores by content hash instead of file path.
  • Deactivate stale docs when all files are removed from a collection and
    qmd update is run. #312 (thanks @0xble)
  • Handle emoji-only filenames (🐘.md1f418.md) instead of crashing.
    #308 (thanks @debugerman)
  • Skip unreadable files during indexing (e.g. iCloud-evicted files returning
    EAGAIN) instead of crashing. #253 (thanks @jimmynail)
  • Suppress progress bar escape sequences when stderr is not a TTY. #230
    (thanks @dgilperez)
  • Emit format-appropriate empty output ([] for JSON, CSV header for CSV,
    etc.) instead of plain text "No results." #228 (thanks @amsminn)
  • Correct Windows sqlite-vec package name (sqlite-vec-windows-x64) and add
    sqlite-vec-linux-arm64. #225 (thanks @ilepn)
  • Fix claude plugin setup CLI commands in README. #311 (thanks @gi11es)

[1.1.1] - 2026-03-06

Fixes

  • Reranker: truncate documents exceeding the 2048-token context window
    instead of silently producing garbage scores. Long chunks (e.g. from
    PDF ingestion) now get a fair ranking.
  • Nix: add python3 and cctools to build dependencies. #214 (thanks
    @pcasaretto)

[1.1.0] - 2026-02-20

QMD now speaks in query documents — structured multi-line queries where every line is typed (lex:, vec:, hyde:), combining keyword precision with semantic recall. A single plain query still works exactly as before (it's treated as an implicit expand: and auto-expanded by the LLM). Lex now supports quoted phrases and negation ("C++ performance" -sports -athlete), making intent-aware disambiguation practical. The formal query grammar is documented in docs/SYNTAX.md.

The npm package now uses the standard #!/usr/bin/env node bin convention, replacing the custom bash wrapper. This fixes native module ABI mismatches when installed via bun and works on any platform with node >= 22 on PATH.

Changes

  • Query document format: multi-line queries with typed sub-queries (lex:, vec:, hyde:). Plain queries remain the default (expand: implicit, but not written inside the document). First sub-query gets 2× fusion weight — put your strongest signal first. Formal grammar in docs/SYNTAX.md.
  • Lex syntax: full BM25 operator support. "exact phrase" for verbatim matching; -term and -"phrase" for exclusions. Essential for disambiguation when a term is overloaded across domains (e.g. performance -sports -athlete).
  • expand: shortcut: send a single plain query (or start the document with expand: on its only line) to auto-expand via the local LLM. Query documents themselves are limited to lex, vec, and hyde lines.
  • MCP query tool (renamed from structured_search): rewrote the tool description to fully teach AI agents the query document format, lex syntax, and combination strategy. Includes worked examples with intent-aware lex.
  • HTTP /query endpoint (renamed from /search; /search kept as silent alias).
  • collections array filter: filter by multiple collections in a single query (collections: ["notes", "brain"]). Removed the single collection string param — array only.
  • Collection include/exclude: includeByDefault: false hides a collection from all queries unless explicitly named via collections. CLI: qmd collection exclude <name> / qmd collection include <name>.
  • Collection update-cmd: attach a shell command that runs before every qmd update (e.g. git stash && git pull --rebase --ff-only && git stash pop). CLI: qmd collection update-cmd <name> '<cmd>'.
  • qmd status tips: shows actionable tips when collections lack context descriptions or update commands.
  • qmd collection subcommands: show, update-cmd, include, exclude. Bare qmd collection now prints help.
  • Packaging: replaced custom bash wrapper with standard #!/usr/bin/env node shebang on dist/qmd.js. Fixes native module ABI mismatches when installed via bun, and works on any platform where node >= 22 is on PATH.
  • Removed MCP tools search, vector_search, deep_search — all superseded by query.
  • Removed qmd context check command.
  • CLI timing: each LLM step (expand, embed, rerank) prints elapsed time inline (Expanding query... (4.2s)).

Fixes

  • qmd collection list shows [excluded] tag for collections with includeByDefault: false.
  • Default searches now respect includeByDefault — excluded collections are skipped unless explicitly named.
  • Fix main module detection when installed globally via npm/bun (symlink resolution).

v1.1.1

07 Mar 18:12
v1.1.1
2ae1bab

Choose a tag to compare

[1.1.1] - 2026-03-06

Fixes

  • Reranker: truncate documents exceeding the 2048-token context window
    instead of silently producing garbage scores. Long chunks (e.g. from
    PDF ingestion) now get a fair ranking.
  • Nix: add python3 and cctools to build dependencies. #214 (thanks
    @pcasaretto)

[1.1.0] - 2026-02-20

QMD now speaks in query documents — structured multi-line queries where every line is typed (lex:, vec:, hyde:), combining keyword precision with semantic recall. A single plain query still works exactly as before (it's treated as an implicit expand: and auto-expanded by the LLM). Lex now supports quoted phrases and negation ("C++ performance" -sports -athlete), making intent-aware disambiguation practical. The formal query grammar is documented in docs/SYNTAX.md.

The npm package now uses the standard #!/usr/bin/env node bin convention, replacing the custom bash wrapper. This fixes native module ABI mismatches when installed via bun and works on any platform with node >= 22 on PATH.

Changes

  • Query document format: multi-line queries with typed sub-queries (lex:, vec:, hyde:). Plain queries remain the default (expand: implicit, but not written inside the document). First sub-query gets 2× fusion weight — put your strongest signal first. Formal grammar in [docs/SYNTAX.md](https://github.com/tobi/qmd/blob/main/docs/SYNTAX.md).
  • Lex syntax: full BM25 operator support. "exact phrase" for verbatim matching; -term and -"phrase" for exclusions. Essential for disambiguation when a term is overloaded across domains (e.g. performance -sports -athlete).
  • expand: shortcut: send a single plain query (or start the document with expand: on its only line) to auto-expand via the local LLM. Query documents themselves are limited to lex, vec, and hyde lines.
  • MCP query tool (renamed from structured_search): rewrote the tool description to fully teach AI agents the query document format, lex syntax, and combination strategy. Includes worked examples with intent-aware lex.
  • HTTP /query endpoint (renamed from /search; /search kept as silent alias).
  • collections array filter: filter by multiple collections in a single query (collections: ["notes", "brain"]). Removed the single collection string param — array only.
  • Collection include/exclude: includeByDefault: false hides a collection from all queries unless explicitly named via collections. CLI: qmd collection exclude <name> / qmd collection include <name>.
  • Collection update-cmd: attach a shell command that runs before every qmd update (e.g. git stash && git pull --rebase --ff-only && git stash pop). CLI: qmd collection update-cmd <name> '<cmd>'.
  • qmd status tips: shows actionable tips when collections lack context descriptions or update commands.
  • qmd collection subcommands: show, update-cmd, include, exclude. Bare qmd collection now prints help.
  • Packaging: replaced custom bash wrapper with standard #!/usr/bin/env node shebang on dist/qmd.js. Fixes native module ABI mismatches when installed via bun, and works on any platform where node >= 22 is on PATH.
  • Removed MCP tools search, vector_search, deep_search — all superseded by query.
  • Removed qmd context check command.
  • CLI timing: each LLM step (expand, embed, rerank) prints elapsed time inline (Expanding query... (4.2s)).

Fixes

  • qmd collection list shows [excluded] tag for collections with includeByDefault: false.
  • Default searches now respect includeByDefault — excluded collections are skipped unless explicitly named.
  • Fix main module detection when installed globally via npm/bun (symlink resolution).

v1.0.7

18 Feb 19:56
v1.0.7
648779a

Choose a tag to compare

[1.0.7] - 2026-02-18

Changes

  • LLM: add LiquidAI LFM2-1.2B as an alternative base model for query
    expansion fine-tuning. LFM2's hybrid architecture (convolutions + attention)
    is 2x faster at decode/prefill vs standard transformers — good fit for
    on-device inference.
  • CLI: support multiple -c flags to search across several collections at
    once (e.g. qmd search -c notes -c journals "query"). #191 (thanks
    @openclaw)

Fixes

  • Return empty JSON array [] instead of no output when --json search
    finds no results.
  • Resolve relative paths passed to --index so they don't produce malformed
    config entries.
  • Respect XDG_CONFIG_HOME for collection config path instead of always
    using ~/.config. #190 (thanks @openclaw)
  • CLI: empty-collection hint now shows the correct collection add command.
    #200 (thanks @vincentkoc)

[1.0.6] - 2026-02-16

Changes

  • CLI: qmd status now shows models with full HuggingFace links instead of
    static names in --help. Model info is derived from the actual configured
    URIs so it stays accurate if models change.
  • Release tooling: pre-push hook handles non-interactive shells (CI, editors)
    gracefully — warnings auto-proceed instead of hanging on a tty prompt.
    Annotated tags now resolve correctly for CI checks.

[1.0.5] - 2026-02-16

The npm package now ships compiled JavaScript instead of raw TypeScript,
removing the tsx runtime dependency. A new /release skill automates the
full release workflow with changelog validation and git hook enforcement.

Changes

  • Build: compile TypeScript to dist/ via tsc so the npm package no longer
    requires tsx at runtime. The qmd shell wrapper now runs dist/qmd.js
    directly.
  • Release tooling: new /release skill that manages the full release
    lifecycle — validates changelog, installs git hooks, previews release notes,
    and cuts the release. Auto-populates [Unreleased] from git history when
    empty.
  • Release tooling: scripts/extract-changelog.sh extracts cumulative notes
    for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases.
    Includes [Unreleased] content in previews.
  • Release tooling: scripts/release.sh renames [Unreleased] to a versioned
    heading and inserts a fresh empty [Unreleased] section automatically.
  • Release tooling: pre-push git hook blocks v* tag pushes unless
    package.json version matches the tag, a changelog entry exists, and CI
    passed on GitHub.
  • Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub
    release with cumulative notes extracted from the changelog, and publishes
    to npm with provenance.

[1.0.0] - 2026-02-15

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel GPU contexts. GPU auto-detection replaces the unreliable
gpu: "auto" with explicit CUDA/Metal/Vulkan probing.

Changes

  • Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
    abstraction layer (src/db.ts). bun:sqlite on Bun, better-sqlite3 on
    Node. The qmd wrapper auto-detects a suitable Node.js install via PATH,
    then falls back to mise, asdf, nvm, and Homebrew locations.
  • Performance: parallel embedding & reranking via multiple LlamaContext
    instances — up to 2.7x faster on multi-core machines.
  • Performance: flash attention for ~20% less VRAM per reranking context,
    enabling more parallel contexts on GPU.
  • Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
    memory) since chunks are capped at ~900 tokens.
  • Performance: adaptive parallelism — context count computed from available
    VRAM (GPU) or CPU math cores rather than hardcoded.
  • GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
    relying on node-llama-cpp's gpu: "auto". qmd status shows device info.
  • Tests: reorganized into flat test/ directory with vitest for Node.js and
    bun test for Bun. New eval-bm25 and store.helpers.unit suites.

Fixes

  • Prevent VRAM waste from duplicate context creation during concurrent
    embedBatch calls — initialization lock now covers the full path.
  • Collection-aware FTS filtering so scoped keyword search actually restricts
    results to the requested collection.

v1.0.6

16 Feb 14:39
v1.0.6
51c03d9

Choose a tag to compare

[1.0.6] - 2026-02-16

Changes

  • CLI: qmd status now shows models with full HuggingFace links instead of
    static names in --help. Model info is derived from the actual configured
    URIs so it stays accurate if models change.
  • Release tooling: pre-push hook handles non-interactive shells (CI, editors)
    gracefully — warnings auto-proceed instead of hanging on a tty prompt.
    Annotated tags now resolve correctly for CI checks.

[1.0.5] - 2026-02-16

The npm package now ships compiled JavaScript instead of raw TypeScript,
removing the tsx runtime dependency. A new /release skill automates the
full release workflow with changelog validation and git hook enforcement.

Changes

  • Build: compile TypeScript to dist/ via tsc so the npm package no longer
    requires tsx at runtime. The qmd shell wrapper now runs dist/qmd.js
    directly.
  • Release tooling: new /release skill that manages the full release
    lifecycle — validates changelog, installs git hooks, previews release notes,
    and cuts the release. Auto-populates [Unreleased] from git history when
    empty.
  • Release tooling: scripts/extract-changelog.sh extracts cumulative notes
    for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases.
    Includes [Unreleased] content in previews.
  • Release tooling: scripts/release.sh renames [Unreleased] to a versioned
    heading and inserts a fresh empty [Unreleased] section automatically.
  • Release tooling: pre-push git hook blocks v* tag pushes unless
    package.json version matches the tag, a changelog entry exists, and CI
    passed on GitHub.
  • Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub
    release with cumulative notes extracted from the changelog, and publishes
    to npm with provenance.

[1.0.0] - 2026-02-15

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel GPU contexts. GPU auto-detection replaces the unreliable
gpu: "auto" with explicit CUDA/Metal/Vulkan probing.

Changes

  • Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
    abstraction layer (src/db.ts). bun:sqlite on Bun, better-sqlite3 on
    Node. The qmd wrapper auto-detects a suitable Node.js install via PATH,
    then falls back to mise, asdf, nvm, and Homebrew locations.
  • Performance: parallel embedding & reranking via multiple LlamaContext
    instances — up to 2.7x faster on multi-core machines.
  • Performance: flash attention for ~20% less VRAM per reranking context,
    enabling more parallel contexts on GPU.
  • Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
    memory) since chunks are capped at ~900 tokens.
  • Performance: adaptive parallelism — context count computed from available
    VRAM (GPU) or CPU math cores rather than hardcoded.
  • GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
    relying on node-llama-cpp's gpu: "auto". qmd status shows device info.
  • Tests: reorganized into flat test/ directory with vitest for Node.js and
    bun test for Bun. New eval-bm25 and store.helpers.unit suites.

Fixes

  • Prevent VRAM waste from duplicate context creation during concurrent
    embedBatch calls — initialization lock now covers the full path.
  • Collection-aware FTS filtering so scoped keyword search actually restricts
    results to the requested collection.

v1.0.5

16 Feb 12:59
v1.0.5
8dd6cdc

Choose a tag to compare

[1.0.5] - 2026-02-16

The npm package now ships compiled JavaScript instead of raw TypeScript,
removing the tsx runtime dependency. A new /release skill automates the
full release workflow with changelog validation and git hook enforcement.

Changes

  • Build: compile TypeScript to dist/ via tsc so the npm package no longer
    requires tsx at runtime. The qmd shell wrapper now runs dist/qmd.js
    directly.
  • Release tooling: new /release skill that manages the full release
    lifecycle — validates changelog, installs git hooks, previews release notes,
    and cuts the release. Auto-populates [Unreleased] from git history when
    empty.
  • Release tooling: scripts/extract-changelog.sh extracts cumulative notes
    for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases.
    Includes [Unreleased] content in previews.
  • Release tooling: scripts/release.sh renames [Unreleased] to a versioned
    heading and inserts a fresh empty [Unreleased] section automatically.
  • Release tooling: pre-push git hook blocks v* tag pushes unless
    package.json version matches the tag, a changelog entry exists, and CI
    passed on GitHub.
  • Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub
    release with cumulative notes extracted from the changelog, and publishes
    to npm with provenance.

[1.0.0] - 2026-02-15

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel GPU contexts. GPU auto-detection replaces the unreliable
gpu: "auto" with explicit CUDA/Metal/Vulkan probing.

Changes

  • Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
    abstraction layer (src/db.ts). bun:sqlite on Bun, better-sqlite3 on
    Node. The qmd wrapper auto-detects a suitable Node.js install via PATH,
    then falls back to mise, asdf, nvm, and Homebrew locations.
  • Performance: parallel embedding & reranking via multiple LlamaContext
    instances — up to 2.7x faster on multi-core machines.
  • Performance: flash attention for ~20% less VRAM per reranking context,
    enabling more parallel contexts on GPU.
  • Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
    memory) since chunks are capped at ~900 tokens.
  • Performance: adaptive parallelism — context count computed from available
    VRAM (GPU) or CPU math cores rather than hardcoded.
  • GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
    relying on node-llama-cpp's gpu: "auto". qmd status shows device info.
  • Tests: reorganized into flat test/ directory with vitest for Node.js and
    bun test for Bun. New eval-bm25 and store.helpers.unit suites.

Fixes

  • Prevent VRAM waste from duplicate context creation during concurrent
    embedBatch calls — initialization lock now covers the full path.
  • Collection-aware FTS filtering so scoped keyword search actually restricts
    results to the requested collection.