Skip to content

Commit a020a7b

Browse files
Filmbostock
andauthored
Add support for DuckDB database files to sql and DuckDBClient.of (#1065)
* Add support for DuckDB database files to sql and DuckDBClient.of closes #1057 * let DuckDB handle any other file as a database file * document * a bit more doc * clarify attach, append an example database and the associated (but inert) data loader. * doc edits * .{db,ddb,duckdb} --------- Co-authored-by: Mike Bostock <[email protected]>
1 parent a631835 commit a020a7b

File tree

5 files changed

+21
-3
lines changed

5 files changed

+21
-3
lines changed

docs/lib/duckdb.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# DuckDB
22

3+
<div class="tip">The most convenient way to use DuckDB in Observable is the built-in <a href="../sql">SQL code blocks</a> and <a href="../sql#sql-literals"><code>sql</code> tagged template literal</a>. Use <code>DuckDBClient</code> or DuckDB-Wasm directly, as shown here, if you need greater control.</div>
4+
35
DuckDB is “an in-process SQL OLAP Database Management System. [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as `duckdb` in Markdown, but you can explicitly import it as:
46

57
```js echo
@@ -12,7 +14,7 @@ For convenience, we provide a [`DatabaseClient`](https://observablehq.com/@obser
1214
import {DuckDBClient} from "npm:@observablehq/duckdb";
1315
```
1416

15-
To get a DuckDB client, pass zero or more named tables to `DuckDBClient.of`. Each table can be expressed as a [`FileAttachment`](../javascript/files), [Arquero table](./arquero), [Arrow table](./arrow), an array of objects, or a promise to the same. For example, below we load a sample of 250,000 stars from the [Gaia Star Catalog](https://observablehq.com/@cmudig/peeking-into-the-gaia-star-catalog) as a [Apache Parquet](https://parquet.apache.org/) file:
17+
To get a DuckDB client, pass zero or more named tables to `DuckDBClient.of`. Each table can be expressed as a [`FileAttachment`](../javascript/files), [Arquero table](./arquero), [Arrow table](./arrow), an array of objects, or a promise to the same. For file attachments, the following formats are supported: [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet). For example, below we load a sample of 250,000 stars from the [Gaia Star Catalog](https://observablehq.com/@cmudig/peeking-into-the-gaia-star-catalog) as a Parquet file:
1618

1719
```js echo
1820
const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});
@@ -53,7 +55,17 @@ Plot.plot({
5355
})
5456
```
5557

56-
For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import).
58+
You can also [attach](https://duckdb.org/docs/sql/statements/attach) a complete database saved as DuckDB file, typically using the `.db` file extension (or `.ddb` or `.duckdb`). In this case, the associated name (below `base`) is a _schema_ name rather than a _table_ name.
59+
60+
```js echo
61+
const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
62+
```
63+
64+
```js echo
65+
db2.queryRow(`SELECT COUNT() FROM base.events`)
66+
```
67+
68+
For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically).
5769

5870
```js run=false
5971
const db = await DuckDBClient.of();
@@ -70,6 +82,8 @@ As an alternative to `db.sql`, there’s also `db.query`:
7082
db.query("SELECT * FROM gaia LIMIT 10")
7183
```
7284

85+
<div class="note">The <code>db.sql</code> and <code>db.query</code> methods return a promise to an <a href="./arrow">Arrow table</a>. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using <a href="../inputs/table"><code>Inputs.table</code></a> and pass the data to <a href="./plot">Plot</a>.</div>
86+
7387
And `db.queryRow`:
7488

7589
```js echo

docs/lib/quakes.db

524 KB
Binary file not shown.

docs/lib/quakes.db.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
duckdb docs/lib/quakes.db -c "CREATE TABLE events AS (FROM 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv');"

docs/sql.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ sql:
55

66
# SQL <a href="https://github.com/observablehq/framework/releases/tag/v1.2.0" target="_blank" class="observablehq-version-badge" data-version="^1.2.0" title="Added in v1.2.0"></a>
77

8-
Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet) files, which can either be static or generated by [data loaders](./loaders).
8+
Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), [Apache Parquet](./lib/arrow#apache-parquet), and DuckDB database files, which can either be static or generated by [data loaders](./loaders).
99

1010
To use SQL, first register the desired tables in the page’s [front matter](./markdown#front-matter) using the **sql** option. Each key is a table name, and each value is the path to the corresponding data file. For example, to register a table named `gaia` from a Parquet file:
1111

src/client/stdlib/duckdb.js

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,9 @@ async function insertFile(database, name, file, options) {
255255
if (/\.parquet$/i.test(file.name)) {
256256
return await connection.query(`CREATE VIEW '${name}' AS SELECT * FROM parquet_scan('${file.name}')`);
257257
}
258+
if (/\.(db|ddb|duckdb)$/i.test(file.name)) {
259+
return await connection.query(`ATTACH '${file.name}' AS ${name} (READ_ONLY)`);
260+
}
258261
throw new Error(`unknown file type: ${file.mimeType}`);
259262
}
260263
} finally {

0 commit comments

Comments
 (0)