|
1 | 1 | --- |
2 | 2 | title: "DuckDB" |
3 | 3 | sidebarTitle: "DuckDB" |
4 | | - |
| 4 | +description: "Learn how to use the DuckDB-Lance extension to query Lance tables with SQL." |
5 | 5 | --- |
6 | 6 |
|
7 | | -import { |
8 | | - PyPlatformsDuckdbCreateTable, |
9 | | - PyPlatformsDuckdbMeanPrice, |
10 | | - PyPlatformsDuckdbQueryTable, |
11 | | -} from '/snippets/integrations.mdx'; |
| 7 | +LanceDB integrates with [DuckDB](https://duckdb.org/) through the DuckDB Lance extension. In this page, we'll show how LanceDB manages table lifecycle, and DuckDB provides SQL analytics (including joins) and search over those tables. |
| 8 | + |
| 9 | +Note that earlier versions of LanceDB used to recommend converting Lance tables to Arrow tables via `table.to_arrow()`. Although this method is still available (because DuckDB [natively scans Arrow tables](https://duckdb.org/2021/12/03/duck-arrow)), it is no longer the recommended workflow for working with Lance tables in DuckDB. This page shows how to use the Lance extension with namespace-attached LanceDB tables, allowing you to pushdown SQL queries directly to the Lance layer. |
12 | 10 |
|
13 | | -<Badge color="purple">OSS-only</Badge> |
14 | 11 |
|
15 | | -In Python, LanceDB tables can also be queried with [DuckDB](https://duckdb.org/), an in-process SQL OLAP database. |
16 | | -This means you can write complex SQL queries to analyze your data in LanceDB. |
| 12 | +## Install |
17 | 13 |
|
18 | | -The integration is done via [Apache Arrow](https://duckdb.org/docs/guides/python/sql_on_arrow), which provides |
19 | | -zero-copy data sharing between LanceDB and DuckDB. DuckDB is capable of passing down column selections and basic |
20 | | -filters to LanceDB, reducing the amount of data that needs to be scanned to perform your query. Finally, the |
21 | | -integration allows streaming data from LanceDB tables, allowing you to aggregate tables that don't fit into |
22 | | -memory. |
| 14 | +Install the DuckDB CLI as per [their docs](https://duckdb.org/install) and alternatively, their Python package with `pip install duckdb`. |
23 | 15 |
|
24 | | -<Tip> |
25 | | -**DuckDB quacks Arrow** |
| 16 | +Then, open the DuckDB CLI and install and load the Lance extension as follows: |
26 | 17 |
|
27 | | -All of this uses the same mechanism described in DuckDB's [blog post](https://duckdb.org/2021/12/03/duck-arrow.html)" |
28 | | -on how it integrates with Apache Arrow. |
29 | | -</Tip> |
| 18 | +```sql SQL icon="database" |
| 19 | +INSTALL lance; |
| 20 | +LOAD lance; |
| 21 | +``` |
30 | 22 |
|
31 | | -We can demonstrate this by first installing `duckdb` and `lancedb`. |
| 23 | +## Attach the directory namespace in DuckDB |
32 | 24 |
|
33 | | -<CodeBlock filename="bash" language="bash" icon="terminal"> |
34 | | -pip install duckdb lancedb |
35 | | -</CodeBlock> |
| 25 | +Attach the LanceDB root directory as a Lance namespace: |
36 | 26 |
|
37 | | -We will re-use the dataset [created previously](/integrations/data/pandas_and_pyarrow/): |
| 27 | +```sql SQL icon="database" |
| 28 | +ATTACH './local_lancedb' AS lance_ns (TYPE LANCE); |
| 29 | +``` |
38 | 30 |
|
39 | | -<CodeBlock filename="Python" language="Python" icon="python"> |
40 | | - {PyPlatformsDuckdbCreateTable} |
41 | | -</CodeBlock> |
| 31 | +In this page, tables are referenced using `lance_ns.main.<table_name>`, so the table path is `lance_ns.main.lance_duck`. |
42 | 32 |
|
43 | | -The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to DuckDB through the Arrow compatibility layer. |
44 | | -To query the resulting Lance dataset in DuckDB, all you need to do is reference the dataset by the same name in your SQL query. |
| 33 | +## Write Lance table |
45 | 34 |
|
46 | | -<CodeBlock filename="Python" language="Python" icon="python"> |
47 | | - {PyPlatformsDuckdbQueryTable} |
48 | | -</CodeBlock> |
| 35 | +Create the `lance_duck` table using SQL and populate it with sample data: |
49 | 36 |
|
| 37 | +```sql SQL icon="database" |
| 38 | +CREATE OR REPLACE TABLE lance_ns.main.lance_duck AS |
| 39 | +SELECT * |
| 40 | +FROM ( |
| 41 | + VALUES |
| 42 | + ('duck', 'quack', [0.9, 0.7, 0.1]::FLOAT[]), |
| 43 | + ('horse', 'neigh', [0.3, 0.1, 0.5]::FLOAT[]), |
| 44 | + ('dragon', 'roar', [0.5, 0.2, 0.7]::FLOAT[]) |
| 45 | +) AS t(animal, noise, vector); |
50 | 46 | ``` |
51 | | -┌─────────────┬─────────┬────────┐ |
52 | | -│ vector │ item │ price │ |
53 | | -│ float[] │ varchar │ double │ |
54 | | -├─────────────┼─────────┼────────┤ |
55 | | -│ [3.1, 4.1] │ foo │ 10.0 │ |
56 | | -│ [5.9, 26.5] │ bar │ 20.0 │ |
57 | | -└─────────────┴─────────┴────────┘ |
| 47 | + |
| 48 | +This table is the source of truth for all DuckDB queries below. |
| 49 | + |
| 50 | +## Query the table with SQL |
| 51 | + |
| 52 | +```sql SQL icon="database" |
| 53 | +SELECT * |
| 54 | + FROM lance_ns.main.lance_duck |
| 55 | + LIMIT 5; |
58 | 56 | ``` |
59 | 57 |
|
60 | | -You can very easily run any other DuckDB SQL queries on your data. |
| 58 | +## Vector search |
| 59 | + |
| 60 | +```sql SQL icon="database" |
| 61 | +SELECT animal, noise, vector, _distance |
| 62 | + FROM lance_vector_search( |
| 63 | + 'lance_ns.main.lance_duck', |
| 64 | + 'vector', |
| 65 | + [0.8, 0.7, 0.2]::FLOAT[], |
| 66 | + k = 1, |
| 67 | + prefilter = true |
| 68 | + ) |
| 69 | + ORDER BY _distance ASC; |
| 70 | +``` |
61 | 71 |
|
62 | | -<CodeBlock filename="Python" language="Python" icon="python"> |
63 | | - {PyPlatformsDuckdbMeanPrice} |
64 | | -</CodeBlock> |
| 72 | +## Full-text search |
| 73 | + |
| 74 | +```sql SQL icon="database" |
| 75 | +SELECT animal, noise, vector, _score |
| 76 | + FROM lance_fts( |
| 77 | + 'lance_ns.main.lance_duck', |
| 78 | + 'animal', |
| 79 | + 'the brave knight faced the dragon', |
| 80 | + k = 1, |
| 81 | + prefilter = true |
| 82 | + ) |
| 83 | + ORDER BY _score DESC; |
| 84 | +``` |
65 | 85 |
|
| 86 | +## Hybrid search |
| 87 | + |
| 88 | +```sql SQL icon="database" |
| 89 | +SELECT animal, noise, vector, _hybrid_score, _distance, _score |
| 90 | + FROM lance_hybrid_search( |
| 91 | + 'lance_ns.main.lance_duck', |
| 92 | + 'vector', |
| 93 | + [0.8, 0.7, 0.2]::FLOAT[], |
| 94 | + 'animal', |
| 95 | + 'the duck surprised the dragon', |
| 96 | + k = 2, |
| 97 | + prefilter = false, |
| 98 | + alpha = 0.5, |
| 99 | + oversample_factor = 4 |
| 100 | + ) |
| 101 | + ORDER BY _hybrid_score DESC; |
66 | 102 | ``` |
67 | | -┌─────────────┐ |
68 | | -│ mean(price) │ |
69 | | -│ double │ |
70 | | -├─────────────┤ |
71 | | -│ 15.0 │ |
72 | | -└─────────────┘ |
73 | | -``` |
| 103 | + |
| 104 | +## Directory namespace model |
| 105 | + |
| 106 | +A directory namespace maps a LanceDB catalog root to namespace-qualified table identifiers in DuckDB. This keeps table discovery and table naming stable as your project grows. |
| 107 | + |
| 108 | +To learn more about the catalog and namespace model, see [Namespaces and the Catalog Model](/namespaces). |
0 commit comments