|
| 1 | +# Ducklake |
| 2 | + |
| 3 | +Ducklake allows you to store massive amounts of data in S3, but still query it efficiently using DuckDB in natural SQL language. |
| 4 | + |
| 5 | +<video |
| 6 | + className="border-2 rounded-lg object-cover w-full h-full dark:border-gray-800" |
| 7 | + autoPlay |
| 8 | + controls |
| 9 | + id="main-video" |
| 10 | + src="/videos/ducklake_demo.mp4" |
| 11 | +/> |
| 12 | +<br /> |
| 13 | + |
| 14 | +[Learn more about Ducklake](https://ducklake.select//) |
| 15 | + |
| 16 | +## Getting started |
| 17 | + |
| 18 | +Prerequisites: |
| 19 | + |
| 20 | +- A workspace storage configured |
| 21 | +- A Postgres or MySQL resource (Optional for superusers) |
| 22 | + |
| 23 | +Superusers can use the Windmill database as a catalog for Ducklake with no additional configuration. |
| 24 | + |
| 25 | +Go to the workspace settings and configure a Ducklake : |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +Clicking the "Explore" button will open the database manager. Your ducklake behaves like any other database : you can perform all CRUD operations through the UI or with the SQL Repl. You can also create and delete new tables. |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +If you explore your catalog database, you will see that Ducklake created some tables for you : |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +These metadata tables store information about your data and where it is located in S3. |
| 38 | +If you go to your workspace storage settings, you can explore your selected workspace storage at the configured location and see your tables and their contents : |
| 39 | + |
| 40 | + |
| 41 | + |
| 42 | +## Using Ducklake in DuckDB scripts |
| 43 | + |
| 44 | +Ducklakes can be accessed in DuckDB scripts using the `ATTACH` syntax. You can use the Ducklake button in the editor bar for convenience. |
| 45 | + |
| 46 | +In the example below, we pass a list of messages with positive, neutral or negative sentiment. |
| 47 | +This list might come from a Python script which queries new reviews from the Google My Business API, |
| 48 | +and sends them to an LLM to determine their sentiment. |
| 49 | +The messages are then inserted into a Ducklake table, which effectively creates a new parquet file. |
| 50 | + |
| 51 | +```sql |
| 52 | +-- $messages (json[]) |
| 53 | + |
| 54 | +ATTACH 'ducklake://main' AS dl; |
| 55 | +USE dl; |
| 56 | + |
| 57 | +CREATE TABLE IF NOT EXISTS messages ( |
| 58 | + content STRING NOT NULL, |
| 59 | + author STRING NOT NULL, |
| 60 | + date STRING NOT NULL, |
| 61 | + sentiment STRING |
| 62 | +); |
| 63 | + |
| 64 | +CREATE TEMP TABLE new_messages AS |
| 65 | + SELECT |
| 66 | + value->>'content' AS content, |
| 67 | + value->>'author' AS author, |
| 68 | + value->>'date' AS date, |
| 69 | + value->>'sentiment' AS sentiment |
| 70 | + FROM json_each($messages); |
| 71 | + |
| 72 | +INSERT INTO messages |
| 73 | + SELECT * FROM new_messages; |
| 74 | +``` |
0 commit comments