Update duckdb extension docs (#174)

prrao87 · web-flow · commit a7ecf7f64e99 · 2026-03-03T10:30:31.000-05:00
diff --git a/docs/integrations/data/duckdb.mdx b/docs/integrations/data/duckdb.mdx
@@ -1,73 +1,108 @@
 ---
 title: "DuckDB"
 sidebarTitle: "DuckDB"
-
+description: "Learn how to use the DuckDB-Lance extension to query Lance tables with SQL."
 ---
 
-import {
-  PyPlatformsDuckdbCreateTable,
-  PyPlatformsDuckdbMeanPrice,
-  PyPlatformsDuckdbQueryTable,
-} from '/snippets/integrations.mdx';
+LanceDB integrates with [DuckDB](https://duckdb.org/) through the DuckDB Lance extension. In this page, we'll show how LanceDB manages table lifecycle, and DuckDB provides SQL analytics (including joins) and search over those tables.
+
+Note that earlier versions of LanceDB used to recommend converting Lance tables to Arrow tables via `table.to_arrow()`. Although this method is still available (because DuckDB [natively scans Arrow tables](https://duckdb.org/2021/12/03/duck-arrow)), it is no longer the recommended workflow for working with Lance tables in DuckDB. This page shows how to use the Lance extension with namespace-attached LanceDB tables, allowing you to pushdown SQL queries directly to the Lance layer.
 
-<Badge color="purple">OSS-only</Badge>
 
-In Python, LanceDB tables can also be queried with [DuckDB](https://duckdb.org/), an in-process SQL OLAP database.
-This means you can write complex SQL queries to analyze your data in LanceDB.
+## Install
 
-The integration is done via [Apache Arrow](https://duckdb.org/docs/guides/python/sql_on_arrow), which provides 
-zero-copy data sharing between LanceDB and DuckDB. DuckDB is capable of passing down column selections and basic
-filters to LanceDB, reducing the amount of data that needs to be scanned to perform your query. Finally, the
-integration allows streaming data from LanceDB tables, allowing you to aggregate tables that don't fit into
-memory.
+Install the DuckDB CLI as per [their docs](https://duckdb.org/install) and alternatively, their Python package with `pip install duckdb`.
 
-<Tip>
-**DuckDB quacks Arrow**
+Then, open the DuckDB CLI and install and load the Lance extension as follows:
 
-All of this uses the same mechanism described in DuckDB's [blog post](https://duckdb.org/2021/12/03/duck-arrow.html)"
-on how it integrates with Apache Arrow.
-</Tip>
+```sql SQL icon="database"
+INSTALL lance;
+LOAD lance;
+```
 
-We can demonstrate this by first installing `duckdb` and `lancedb`.
+## Attach the directory namespace in DuckDB
 
-<CodeBlock filename="bash" language="bash" icon="terminal">
-pip install duckdb lancedb
-</CodeBlock>
+Attach the LanceDB root directory as a Lance namespace:
 
-We will re-use the dataset [created previously](/integrations/data/pandas_and_pyarrow/):
+```sql SQL icon="database"
+ATTACH './local_lancedb' AS lance_ns (TYPE LANCE);
+```
 
-<CodeBlock filename="Python" language="Python" icon="python">
-  {PyPlatformsDuckdbCreateTable}
-</CodeBlock>
+In this page, tables are referenced using `lance_ns.main.<table_name>`, so the table path is `lance_ns.main.lance_duck`.
 
-The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to DuckDB through the Arrow compatibility layer.
-To query the resulting Lance dataset in DuckDB, all you need to do is reference the dataset by the same name in your SQL query.
+## Write Lance table
 
-<CodeBlock filename="Python" language="Python" icon="python">
-  {PyPlatformsDuckdbQueryTable}
-</CodeBlock>
+Create the `lance_duck` table using SQL and populate it with sample data:
 
+```sql SQL icon="database"
+CREATE OR REPLACE TABLE lance_ns.main.lance_duck AS
+SELECT *
+FROM (
+  VALUES
+    ('duck', 'quack', [0.9, 0.7, 0.1]::FLOAT[]),
+    ('horse', 'neigh', [0.3, 0.1, 0.5]::FLOAT[]),
+    ('dragon', 'roar', [0.5, 0.2, 0.7]::FLOAT[])
+) AS t(animal, noise, vector);
 ```
-┌─────────────┬─────────┬────────┐
-│   vector    │  item   │ price  │
-│   float[]   │ varchar │ double │
-├─────────────┼─────────┼────────┤
-│ [3.1, 4.1]  │ foo     │   10.0 │
-│ [5.9, 26.5] │ bar     │   20.0 │
-└─────────────┴─────────┴────────┘
+
+This table is the source of truth for all DuckDB queries below.
+
+## Query the table with SQL
+
+```sql SQL icon="database"
+SELECT *
+  FROM lance_ns.main.lance_duck
+  LIMIT 5;
 ```
 
-You can very easily run any other DuckDB SQL queries on your data.
+## Vector search
+
+```sql SQL icon="database"
+SELECT animal, noise, vector, _distance
+  FROM lance_vector_search(
+    'lance_ns.main.lance_duck',
+    'vector',
+    [0.8, 0.7, 0.2]::FLOAT[],
+    k = 1,
+    prefilter = true
+  )
+  ORDER BY _distance ASC;
+```
 
-<CodeBlock filename="Python" language="Python" icon="python">
-  {PyPlatformsDuckdbMeanPrice}
-</CodeBlock>
+## Full-text search
+
+```sql SQL icon="database"
+SELECT animal, noise, vector, _score
+  FROM lance_fts(
+    'lance_ns.main.lance_duck',
+    'animal',
+    'the brave knight faced the dragon',
+    k = 1,
+    prefilter = true
+  )
+  ORDER BY _score DESC;
+```
 
+## Hybrid search
+
+```sql SQL icon="database"
+SELECT animal, noise, vector, _hybrid_score, _distance, _score
+  FROM lance_hybrid_search(
+    'lance_ns.main.lance_duck',
+    'vector',
+    [0.8, 0.7, 0.2]::FLOAT[],
+    'animal',
+    'the duck surprised the dragon',
+    k = 2,
+    prefilter = false,
+    alpha = 0.5,
+    oversample_factor = 4
+  )
+  ORDER BY _hybrid_score DESC;
 ```
-┌─────────────┐
-│ mean(price) │
-│   double    │
-├─────────────┤
-│        15.0 │
-└─────────────┘
-```
+
+## Directory namespace model
+
+A directory namespace maps a LanceDB catalog root to namespace-qualified table identifiers in DuckDB. This keeps table discovery and table naming stable as your project grows.
+
+To learn more about the catalog and namespace model, see [Namespaces and the Catalog Model](/namespaces).