Skip to content

add queriable search index for ducklake docs#303

Merged
guillesd merged 3 commits intoduckdb:mainfrom
guillesd:queriable_search_index
Mar 16, 2026
Merged

add queriable search index for ducklake docs#303
guillesd merged 3 commits intoduckdb:mainfrom
guillesd:queriable_search_index

Conversation

@guillesd
Copy link
Collaborator

DuckLake Docs Search Index

Adds a queryable full-text search index of the DuckLake documentation as a .duckdb file, built automatically during site deployment.

What it does

scripts/generate_search_index.py parses all markdown files under docs/stable/ and docs/preview/, chunks them (one chunk per H2 section), and writes them to a DuckDB database with a pre-built FTS index using porter stemming.

The resulting data/docs-search.duckdb contains a single docs_chunks table with ~344 rows across both versions.

How to query

ATTACH 'https://ducklake.select/data/docs-search.duckdb' AS docs (READ_ONLY);
LOAD fts;
USE docs;

SELECT chunk_id, page_title, section, url, score
FROM (
    SELECT *,
           fts_main_docs_chunks.match_bm25(chunk_id, 'attach catalog') AS score
    FROM docs_chunks
    WHERE version = 'stable'
)
WHERE score IS NOT NULL
ORDER BY score DESC
LIMIT 10;

Schema

Column Type Description
chunk_id VARCHAR (PK) Unique identifier, e.g. stable/duckdb/usage/connecting#examples
page_title VARCHAR Page title from front matter
section VARCHAR Section heading (null for page intros)
breadcrumb VARCHAR Path-derived breadcrumb, e.g. DuckDB > Usage > Connecting
url VARCHAR URL path with anchor, e.g. /docs/stable/duckdb/usage/connecting#examples
version VARCHAR stable or preview
text TEXT Full preserved markdown text of the chunk

@guillesd guillesd merged commit d0d4541 into duckdb:main Mar 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant