Add support for DuckDB database files to sql and DuckDBClient.of (#1065)

Fil · mbostock · web-flow · commit a020a7bd00b5 · 2024-03-18T13:36:49.000-07:00
* Add support for DuckDB database files to sql and DuckDBClient.of closes #1057 * let DuckDB handle any other file as a database file * document * a bit more doc * clarify attach, append an example database and the associated (but inert) data loader. * doc edits * .{db,ddb,duckdb} --------- Co-authored-by: Mike Bostock <mbostock@gmail.com>
diff --git a/docs/lib/duckdb.md b/docs/lib/duckdb.md
@@ -1,5 +1,7 @@
 # DuckDB
 
+<div class="tip">The most convenient way to use DuckDB in Observable is the built-in <a href="../sql">SQL code blocks</a> and <a href="../sql#sql-literals"><code>sql</code> tagged template literal</a>. Use <code>DuckDBClient</code> or DuckDB-Wasm directly, as shown here, if you need greater control.</div>
+
 DuckDB is “an in-process SQL OLAP Database Management System. [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as `duckdb` in Markdown, but you can explicitly import it as:
 
 ```js echo
@@ -12,7 +14,7 @@ For convenience, we provide a [`DatabaseClient`](https://observablehq.com/@obser
 import {DuckDBClient} from "npm:@observablehq/duckdb";
 ```
 
-To get a DuckDB client, pass zero or more named tables to `DuckDBClient.of`. Each table can be expressed as a [`FileAttachment`](../javascript/files), [Arquero table](./arquero), [Arrow table](./arrow), an array of objects, or a promise to the same. For example, below we load a sample of 250,000 stars from the [Gaia Star Catalog](https://observablehq.com/@cmudig/peeking-into-the-gaia-star-catalog) as a [Apache Parquet](https://parquet.apache.org/) file:
+To get a DuckDB client, pass zero or more named tables to `DuckDBClient.of`. Each table can be expressed as a [`FileAttachment`](../javascript/files), [Arquero table](./arquero), [Arrow table](./arrow), an array of objects, or a promise to the same. For file attachments, the following formats are supported: [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet). For example, below we load a sample of 250,000 stars from the [Gaia Star Catalog](https://observablehq.com/@cmudig/peeking-into-the-gaia-star-catalog) as a Parquet file:
 
 ```js echo
 const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});
@@ -53,7 +55,17 @@ Plot.plot({
 })
 ```
 
-For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import).
+You can also [attach](https://duckdb.org/docs/sql/statements/attach) a complete database saved as DuckDB file, typically using the `.db` file extension (or `.ddb` or `.duckdb`). In this case, the associated name (below `base`) is a _schema_ name rather than a _table_ name.
+
+```js echo
+const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
+```
+
+```js echo
+db2.queryRow(`SELECT COUNT() FROM base.events`)
+```
+
+For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically).
 
 ```js run=false
 const db = await DuckDBClient.of();
@@ -70,6 +82,8 @@ As an alternative to `db.sql`, there’s also `db.query`:
 db.query("SELECT * FROM gaia LIMIT 10")
 ```
 
+<div class="note">The <code>db.sql</code> and <code>db.query</code> methods return a promise to an <a href="./arrow">Arrow table</a>. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using <a href="../inputs/table"><code>Inputs.table</code></a> and pass the data to <a href="./plot">Plot</a>.</div>
+
 And `db.queryRow`:
 
 ```js echo
diff --git a/docs/lib/quakes.db b/docs/lib/quakes.db
diff --git a/docs/lib/quakes.db.sh b/docs/lib/quakes.db.sh
@@ -0,0 +1 @@
+duckdb docs/lib/quakes.db -c "CREATE TABLE events AS (FROM 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv');"
diff --git a/docs/sql.md b/docs/sql.md
@@ -5,7 +5,7 @@ sql:
 
 # SQL <a href="https://github.com/observablehq/framework/releases/tag/v1.2.0" target="_blank" class="observablehq-version-badge" data-version="^1.2.0" title="Added in v1.2.0"></a>
 
-Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet) files, which can either be static or generated by [data loaders](./loaders).
+Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), [Apache Parquet](./lib/arrow#apache-parquet), and DuckDB database files, which can either be static or generated by [data loaders](./loaders).
 
 To use SQL, first register the desired tables in the page’s [front matter](./markdown#front-matter) using the **sql** option. Each key is a table name, and each value is the path to the corresponding data file. For example, to register a table named `gaia` from a Parquet file:
 
diff --git a/src/client/stdlib/duckdb.js b/src/client/stdlib/duckdb.js
@@ -255,6 +255,9 @@ async function insertFile(database, name, file, options) {
         if (/\.parquet$/i.test(file.name)) {
           return await connection.query(`CREATE VIEW '${name}' AS SELECT * FROM parquet_scan('${file.name}')`);
         }
+        if (/\.(db|ddb|duckdb)$/i.test(file.name)) {
+          return await connection.query(`ATTACH '${file.name}' AS ${name} (READ_ONLY)`);
+        }
         throw new Error(`unknown file type: ${file.mimeType}`);
     }
   } finally {

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+duckdb docs/lib/quakes.db -c "CREATE TABLE events AS (FROM 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv');"`
Original file line number	Diff line number	Diff line change
`@@ -255,6 +255,9 @@ async function insertFile(database, name, file, options) {`
`255`	`255`	`if (/\.parquet$/i.test(file.name)) {`
`256`	`256`	return await connection.query(`CREATE VIEW '${name}' AS SELECT * FROM parquet_scan('${file.name}')`);
`257`	`257`	`}`
	`258`	`+ if (/\.(db\|ddb\|duckdb)$/i.test(file.name)) {`
	`259`	+ return await connection.query(`ATTACH '${file.name}' AS ${name} (READ_ONLY)`);
	`260`	`+ }`
`258`	`261`	throw new Error(`unknown file type: ${file.mimeType}`);
`259`	`262`	`}`
`260`	`263`	`} finally {`