|
1 | 1 | # Parser Tools |
2 | 2 |
|
3 | | -This repository is based on https://github.com/duckdb/extension-template, check it out if you want to build and ship your own DuckDB extension. |
| 3 | +An experimental DuckDB extension that exposes functionality from DuckDB's native SQL parser. |
4 | 4 |
|
5 | | ---- |
| 5 | +## Overview |
6 | 6 |
|
7 | | -This extension, ParseTables, allow you to ... <extension_goal>. |
| 7 | +`parser_tools` is a DuckDB extension designed to provide SQL parsing capabilities within the database. It allows you to analyze SQL queries and extract structural information directly in SQL. Currently, it includes a single table function: `parse_tables`, which extracts table references from a given SQL query. Future versions may expose additional aspects of the parsed query structure. |
8 | 8 |
|
| 9 | +## Features |
9 | 10 |
|
10 | | -## Building |
11 | | -### Managing dependencies |
12 | | -DuckDB extensions uses VCPKG for dependency management. Enabling VCPKG is very simple: follow the [installation instructions](https://vcpkg.io/en/getting-started) or just run the following: |
13 | | -```shell |
14 | | -git clone https://github.com/Microsoft/vcpkg.git |
15 | | -./vcpkg/bootstrap-vcpkg.sh |
16 | | -export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake |
| 11 | +- Extract table references from a SQL query |
| 12 | +- See the **context** in which each table is used (e.g. `FROM`, `JOIN`, etc.) |
| 13 | +- Includes **schema**, **table**, and **context** information |
| 14 | +- Built on DuckDB's native SQL parser |
| 15 | +- Simple SQL interface — no external tooling required |
| 16 | + |
| 17 | +## Installation |
| 18 | + |
| 19 | +```sql |
| 20 | +INSTALL 'parser_tools'; |
| 21 | +LOAD 'parser_tools'; |
| 22 | +``` |
| 23 | + |
| 24 | +## Usage |
| 25 | + |
| 26 | +### Parse table references from a query |
| 27 | +#### Simple example |
| 28 | + |
| 29 | +```sql |
| 30 | +SELECT * FROM parse_tables('SELECT * FROM MyTable'); |
| 31 | +``` |
| 32 | + |
| 33 | +##### Output |
| 34 | + |
| 35 | +``` |
| 36 | +┌─────────┬─────────┬─────────┐ |
| 37 | +│ schema │ table │ context │ |
| 38 | +│ varchar │ varchar │ varchar │ |
| 39 | +├─────────┼─────────┼─────────┤ |
| 40 | +│ main │ MyTable │ from │ |
| 41 | +└─────────┴─────────┴─────────┘ |
| 42 | +``` |
| 43 | + |
| 44 | +This tells you that `MyTable` in the `main` schema was used in the `FROM` clause of the query. |
| 45 | + |
| 46 | +#### CTE Example |
| 47 | +```sql |
| 48 | +select * from parse_tables('with EarlyAdopters as (select * from Users where id < 10) select * from EarlyAdopters;'); |
| 49 | +``` |
| 50 | + |
| 51 | +##### Output |
| 52 | +``` |
| 53 | +┌─────────┬───────────────┬──────────┐ |
| 54 | +│ schema │ table │ context │ |
| 55 | +│ varchar │ varchar │ varchar │ |
| 56 | +├─────────┼───────────────┼──────────┤ |
| 57 | +│ │ EarlyAdopters │ cte │ |
| 58 | +│ main │ Users │ from │ |
| 59 | +│ main │ EarlyAdopters │ from_cte │ |
| 60 | +└─────────┴───────────────┴──────────┘ |
17 | 61 | ``` |
18 | | -Note: VCPKG is only required for extensions that want to rely on it for dependency management. If you want to develop an extension without dependencies, or want to do your own dependency management, just skip this step. Note that the example extension uses VCPKG to build with a dependency for instructive purposes, so when skipping this step the build may not work without removing the dependency. |
| 62 | +This tells us a few things: |
| 63 | +* `EarlyAdopters` was defined as a CTE. |
| 64 | +* The `Users` table was referenced in a from clause. |
| 65 | +* `EarlyAdopters` was referenced in a from clause (but it's a cte, not a table). |
| 66 | + |
| 67 | +## Function Reference |
| 68 | + |
| 69 | +### `parse_tables(query TEXT) → TABLE(schema TEXT, table TEXT, context TEXT)` |
| 70 | + |
| 71 | +Parses the given SQL query and returns a list of all referenced tables along with: |
| 72 | + |
| 73 | +- `schema`: The schema name (e.g., `main`) |
| 74 | +- `table`: The table name |
| 75 | +- `context`: Where in the query the table is used. Possible values include: |
| 76 | + * from: The table appears in the FROM clause |
| 77 | + * joinleft: The table is on the left side of a JOIN |
| 78 | + * joinright: The table is on the right side of a JOIN |
| 79 | + * fromcte: The table appears in the FROM clause, but is a reference to a Common Table Expression (CTE) |
| 80 | + * `with US_Sales() |
| 81 | + * cte: The table is defined as a CTE |
| 82 | + * subquery: The table is used inside a subquery |
| 83 | + |
| 84 | + |
| 85 | +## Development |
19 | 86 |
|
20 | 87 | ### Build steps |
21 | | -Now to build the extension, run: |
| 88 | +To build the extension, run: |
22 | 89 | ```sh |
23 | | -make |
| 90 | +GEN=ninja make |
24 | 91 | ``` |
25 | 92 | The main binaries that will be built are: |
26 | 93 | ```sh |
27 | 94 | ./build/release/duckdb |
28 | 95 | ./build/release/test/unittest |
29 | | -./build/release/extension/parser/parser.duckdb_extension |
| 96 | +./build/release/extension/parser_tools/parser_tools.duckdb_extension |
30 | 97 | ``` |
31 | 98 | - `duckdb` is the binary for the duckdb shell with the extension code automatically loaded. |
32 | 99 | - `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary. |
33 | | -- `parser.duckdb_extension` is the loadable binary as it would be distributed. |
| 100 | +- `parser_tools.duckdb_extension` is the loadable binary as it would be distributed. |
34 | 101 |
|
35 | 102 | ## Running the extension |
36 | | -To run the extension code, simply start the shell with `./build/release/duckdb`. |
| 103 | +To run the extension code, simply start the shell with `./build/release/duckdb` (which has the parser_tools extension built-in). |
37 | 104 |
|
38 | | -Now we can use the features from the extension directly in DuckDB. The template contains a single scalar function `parse_tables()` that takes a string arguments and returns a string: |
| 105 | +Now we can use the features from the extension directly in DuckDB: |
39 | 106 | ``` |
40 | | -D select parse_tables('Jane') as result; |
41 | | -┌───────────────┐ |
42 | | -│ result │ |
43 | | -│ varchar │ |
44 | | -├───────────────┤ |
45 | | -│ ParseTables Jane 🐥 │ |
46 | | -└───────────────┘ |
| 107 | +D select * from parse_tables('select * from MyTable'); |
| 108 | +┌─────────┬─────────┬─────────┐ |
| 109 | +│ schema │ table │ context │ |
| 110 | +│ varchar │ varchar │ varchar │ |
| 111 | +├─────────┼─────────┼─────────┤ |
| 112 | +│ main │ MyTable │ from │ |
| 113 | +└─────────┴─────────┴─────────┘ |
47 | 114 | ``` |
48 | 115 |
|
49 | | -## Running the tests |
50 | | -Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using: |
51 | | -```sh |
52 | | -make test |
| 116 | +## Running the extension from a duckdb distribution |
| 117 | +To run the extension dev build from an existing distribution of duckdb (e.g. cli): |
53 | 118 | ``` |
| 119 | +$ duckdb -unsigned |
54 | 120 |
|
55 | | -### Installing the deployed binaries |
56 | | -To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the |
57 | | -`allow_unsigned_extensions` option set to true. How to set this will depend on the client you're using. Some examples: |
| 121 | +D install parser_tools from './build/release/repository/v1.2.1/osx_amd64/parser_tools.duckdb_extension'; |
| 122 | +D load parser_tools; |
58 | 123 |
|
59 | | -CLI: |
60 | | -```shell |
61 | | -duckdb -unsigned |
| 124 | +D select * from parse_tables('select * from MyTable'); |
| 125 | +┌─────────┬─────────┬─────────┐ |
| 126 | +│ schema │ table │ context │ |
| 127 | +│ varchar │ varchar │ varchar │ |
| 128 | +├─────────┼─────────┼─────────┤ |
| 129 | +│ main │ MyTable │ from │ |
| 130 | +└─────────┴─────────┴─────────┘ |
62 | 131 | ``` |
63 | 132 |
|
64 | | -Python: |
65 | | -```python |
66 | | -con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'}) |
67 | | -``` |
68 | | - |
69 | | -NodeJS: |
70 | | -```js |
71 | | -db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"}); |
72 | | -``` |
| 133 | +## Running the tests |
| 134 | +See [Writing Tests](https://duckdb.org/docs/stable/dev/sqllogictest/writing_tests.html) to learn more about duckdb's testing philosophy. To that end, we define tests in sql at: [test/sql](test/sql/). |
73 | 135 |
|
74 | | -Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension |
75 | | -you want to install. To do this run the following SQL query in DuckDB: |
76 | | -```sql |
77 | | -SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com/<your_extension_name>/latest'; |
| 136 | +The tests can be run with: |
| 137 | +```sh |
| 138 | +make test |
78 | 139 | ``` |
79 | | -Note that the `/latest` path will allow you to install the latest extension version available for your current version of |
80 | | -DuckDB. To specify a specific version, you can pass the version instead. |
81 | 140 |
|
82 | | -After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB: |
83 | | -```sql |
84 | | -INSTALL parse_tables |
85 | | -LOAD parse_tables |
| 141 | +and easily re-ran as changes are made with: |
| 142 | +```sh |
| 143 | +GEN=ninja make && make test |
86 | 144 | ``` |
0 commit comments