Skip to content

Commit d5447f9

Browse files
Update README with comprehensive column parsing documentation
- Add column parsing to main features list with key capabilities: alias chain tracking, nested struct field access, input/output distinction - Document new column context types (select, where, function_arg, etc.) - Add comprehensive parse_columns() function documentation with: * Complete parameter and return value descriptions * Basic column reference examples * Alias chain parsing example showing dependency tracking * Nested struct field access example * Multi-table JOIN examples - Update overview and limitations to include column parsing - Add column_parser_examples.sql for demonstration Column parsing provides complete SQL dependency analysis alongside existing table and function parsing capabilities. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 06ce78b commit d5447f9

File tree

2 files changed

+152
-2
lines changed

2 files changed

+152
-2
lines changed

README.md

Lines changed: 78 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,19 @@ An experimental DuckDB extension that exposes functionality from DuckDB's native
1010

1111
- **Extract table references** from a SQL query with context information (e.g. `FROM`, `JOIN`, etc.)
1212
- **Extract function calls** from a SQL query with context information (e.g. `SELECT`, `WHERE`, `HAVING`, etc.)
13+
- **Extract column references** from a SQL query with comprehensive dependency tracking
1314
- **Parse WHERE clauses** to extract conditions and operators
1415
- Support for **window functions**, **nested functions**, and **CTEs**
16+
- **Alias chain tracking** for complex column dependencies
17+
- **Nested struct field access** parsing (e.g., `table.column.field.subfield`)
18+
- **Input vs output column distinction** for complete dependency analysis
1519
- Includes **schema**, **name**, and **context** information for all extractions
1620
- Built on DuckDB's native SQL parser
1721
- Simple SQL interface — no external tooling required
1822

1923

2024
## Known Limitations
21-
- Only `SELECT` statements are supported for table and function parsing
25+
- Only `SELECT` statements are supported for table, function, and column parsing
2226
- WHERE clause parsing supports additional statement types
2327
- Full parse tree is not exposed (only specific structural elements)
2428

@@ -92,9 +96,17 @@ Context helps identify where elements are used in the query.
9296
- `group_by`: function in a `GROUP BY` clause
9397
- `nested`: function call nested within another function
9498

99+
### Column Context
100+
- `select`: column in a `SELECT` clause
101+
- `where`: column in a `WHERE` clause
102+
- `having`: column in a `HAVING` clause
103+
- `order_by`: column in an `ORDER BY` clause
104+
- `group_by`: column in a `GROUP BY` clause
105+
- `function_arg`: column used as a function argument
106+
95107
## Functions
96108

97-
This extension provides parsing functions for tables, functions, and WHERE clauses. Each category includes both table functions (for detailed results) and scalar functions (for programmatic use).
109+
This extension provides parsing functions for tables, functions, columns, and WHERE clauses. Each category includes both table functions (for detailed results) and scalar functions (for programmatic use).
98110

99111
In general, errors (e.g. Parse Exception) will not be exposed to the user, but instead will result in an empty result. This simplifies batch processing. When validity is needed, [is_parsable](#is_parsablesql_query--scalar-function) can be used.
100112

@@ -190,6 +202,70 @@ SELECT list_filter(parse_functions('SELECT upper(name) FROM users WHERE lower(em
190202

191203
---
192204

205+
### Column Parsing Functions
206+
207+
These functions extract column references from SQL queries, providing comprehensive dependency tracking including alias chains, nested struct field access, and input/output column distinction.
208+
209+
#### `parse_columns(sql_query)` – Table Function
210+
211+
Parses a SQL `SELECT` query and returns all column references along with their context, schema qualification, and dependency information.
212+
213+
##### Usage
214+
```sql
215+
SELECT * FROM parse_columns('SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id;');
216+
```
217+
218+
##### Returns
219+
A table with:
220+
- `expression_identifiers`: JSON array of identifier paths (e.g., `[["u","name"]]` or `[["schema","table","column","field"]]`)
221+
- `table_schema`: schema name for table columns (NULL for aliases/expressions)
222+
- `table_name`: table name for table columns (NULL for aliases/expressions)
223+
- `column_name`: column name for simple references (NULL for complex expressions)
224+
- `context`: where the column appears in the query (select, where, function_arg, etc.)
225+
- `expression`: full expression text as it appears in the SQL
226+
- `selected_name`: output column name for SELECT items (NULL for input columns)
227+
228+
##### Basic Example
229+
```sql
230+
SELECT * FROM parse_columns('SELECT name, age FROM users;');
231+
```
232+
233+
| expression_identifiers | table_schema | table_name | column_name | context | expression | selected_name |
234+
|------------------------|--------------|------------|-------------|---------|------------|---------------|
235+
| [["name"]] | NULL | NULL | name | select | name | NULL |
236+
| [["age"]] | NULL | NULL | age | select | age | NULL |
237+
238+
##### Alias Chain Example
239+
```sql
240+
SELECT * FROM parse_columns('SELECT 1 AS a, users.age AS b, a+b AS c FROM users;');
241+
```
242+
243+
| expression_identifiers | table_schema | table_name | column_name | context | expression | selected_name |
244+
|------------------------|--------------|------------|-------------|--------------|------------|---------------|
245+
| [["users","age"]] | main | users | age | select | users.age | NULL |
246+
| [["users","age"]] | NULL | NULL | NULL | select | users.age | b |
247+
| [["a"]] | NULL | NULL | a | function_arg | a | NULL |
248+
| [["b"]] | NULL | NULL | b | function_arg | b | NULL |
249+
| [["a"],["b"]] | NULL | NULL | NULL | select | (a + b) | c |
250+
251+
##### Nested Struct Example
252+
```sql
253+
SELECT * FROM parse_columns('SELECT users.profile.address.city FROM users;');
254+
```
255+
256+
| expression_identifiers | table_schema | table_name | column_name | context | expression | selected_name |
257+
|------------------------------------------------|--------------|------------|-------------|---------|------------------------------|---------------|
258+
| [["users","profile","address","city"]] | users | profile | address | select | users.profile.address.city | NULL |
259+
260+
##### Complex Multi-table Example
261+
```sql
262+
SELECT * FROM parse_columns('SELECT u.name, o.total, u.age + o.total AS score FROM users u JOIN orders o ON u.id = o.user_id WHERE u.status = "active";');
263+
```
264+
265+
Shows columns from multiple tables with different contexts (select, function_arg, join conditions).
266+
267+
---
268+
193269
### Table Parsing Functions
194270

195271
#### `parse_tables(sql_query)` – Table Function

column_parser_examples.sql

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
-- Column Parser Examples - Demonstrating Key Features
2+
LOAD parser_tools;
3+
4+
SELECT '=== Example 1: Basic Column References ===' as example;
5+
SELECT * FROM parse_columns('SELECT name, age, email FROM customers') LIMIT 3;
6+
7+
SELECT '=== Example 2: Alias Chain (Key Innovation) ===' as example;
8+
SELECT * FROM parse_columns('SELECT 1 AS a, users.age AS b, a+b AS c, b AS d FROM users');
9+
10+
SELECT '=== Example 3: Schema-Qualified Columns ===' as example;
11+
SELECT * FROM parse_columns('SELECT main.customers.name, main.customers.email FROM main.customers') LIMIT 2;
12+
13+
SELECT '=== Example 4: Nested Struct Field Access ===' as example;
14+
SELECT expression_identifiers, expression, table_schema, table_name, column_name
15+
FROM parse_columns('SELECT customers.profile.address.city, customers.profile.address.street FROM customers');
16+
17+
SELECT '=== Example 5: Multi-table JOIN with Complex Expressions ===' as example;
18+
SELECT column_name, context, expression, selected_name
19+
FROM parse_columns('
20+
SELECT
21+
c.name AS customer_name,
22+
o.total AS order_amount,
23+
c.age + o.total AS customer_score
24+
FROM customers c
25+
JOIN orders o ON c.id = o.customer_id
26+
')
27+
WHERE column_name IS NOT NULL OR selected_name IS NOT NULL;
28+
29+
SELECT '=== Example 6: Input vs Output Column Distinction ===' as example;
30+
SELECT
31+
CASE WHEN selected_name IS NULL THEN 'INPUT' ELSE 'OUTPUT' END as column_type,
32+
COALESCE(selected_name, column_name) as identifier,
33+
expression,
34+
context
35+
FROM parse_columns('
36+
SELECT
37+
customers.name AS customer_name,
38+
orders.total * 1.1 AS total_with_tax,
39+
customers.age
40+
FROM customers
41+
JOIN orders ON customers.id = orders.customer_id
42+
')
43+
ORDER BY column_type, identifier;
44+
45+
SELECT '=== Example 7: Different SQL Contexts ===' as example;
46+
SELECT DISTINCT context, COUNT(*) as count
47+
FROM parse_columns('
48+
SELECT
49+
c.name,
50+
COUNT(*) as order_count
51+
FROM customers c
52+
LEFT JOIN orders o ON c.id = o.customer_id
53+
WHERE c.age > 25 AND c.status = ''active''
54+
GROUP BY c.id, c.name
55+
HAVING COUNT(*) > 2
56+
ORDER BY c.name
57+
')
58+
GROUP BY context
59+
ORDER BY context;
60+
61+
SELECT '=== Example 8: Function Arguments vs Select Items ===' as example;
62+
SELECT
63+
context,
64+
column_name,
65+
expression,
66+
CASE WHEN selected_name IS NOT NULL THEN selected_name ELSE 'N/A' END as output_name
67+
FROM parse_columns('
68+
SELECT
69+
UPPER(c.name) AS customer_name,
70+
CONCAT(c.first_name, '' '', c.last_name) AS full_name,
71+
LENGTH(c.email) AS email_length
72+
FROM customers c
73+
')
74+
ORDER BY context, column_name;

0 commit comments

Comments
 (0)