Skip to content

Commit e7cb1f8

Browse files
committed
task: added pg-query and pg-query-ext
1 parent a45d776 commit e7cb1f8

File tree

3 files changed

+406
-0
lines changed

3 files changed

+406
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# PG Query Extension
2+
3+
- [⬅️️ Back](/documentation/introduction.md)
4+
5+
A compiled PHP extension for PostgreSQL query parsing using [libpg_query](https://github.com/pganalyze/libpg_query).
6+
7+
This extension provides low-level functions for parsing PostgreSQL SQL queries. For a higher-level, object-oriented interface with strongly-typed AST nodes, see the [pg-query library](/documentation/components/libs/pg-query.md).
8+
9+
## Features
10+
11+
- Parse PostgreSQL SQL queries into JSON AST
12+
- Generate query fingerprints for query grouping
13+
- Normalize SQL queries (replace literals with placeholders)
14+
- Parse PL/pgSQL functions
15+
- Split multiple SQL statements
16+
- Scan SQL into tokens
17+
18+
## Requirements
19+
20+
- PHP 8.2+
21+
- C compiler (gcc/clang)
22+
- git (for auto-downloading libpg_query)
23+
- make
24+
- protobuf-c library
25+
26+
## Installation
27+
28+
### Using PIE (Recommended)
29+
30+
[PIE](https://github.com/php/pie) is the modern PHP extension installer.
31+
32+
```bash
33+
# Simple installation (auto-downloads libpg_query for PostgreSQL 17)
34+
pie install flow-php/pg-query-ext
35+
36+
# Install with a specific PostgreSQL grammar version (15, 16, or 17)
37+
pie install flow-php/pg-query-ext --with-pg-version=16
38+
```
39+
40+
The extension will automatically download and build the appropriate libpg_query version. Build dependencies (`protobuf-c`, `git`, `make`, `gcc`) must be available on your system.
41+
42+
### Supported PostgreSQL Versions
43+
44+
| PostgreSQL | libpg_query version |
45+
|------------|---------------------|
46+
| 17 | 17-6.1.0 (default) |
47+
| 16 | 16-5.2.0 |
48+
| 15 | 15-4.2.4 |
49+
50+
## Loading the Extension
51+
52+
### In php.ini
53+
54+
```ini
55+
extension=pg_query
56+
```
57+
58+
### During Development
59+
60+
```bash
61+
php -d extension=/path/to/pg_query.so your_script.php
62+
```
63+
64+
## Usage
65+
66+
```php
67+
<?php
68+
69+
// Parse SQL and return JSON AST
70+
$json = pg_query_parse('SELECT * FROM users WHERE id = 1');
71+
$ast = json_decode($json, true);
72+
73+
// Generate fingerprint (same for structurally equivalent queries)
74+
$fp = pg_query_fingerprint('SELECT * FROM users WHERE id = 1');
75+
// Returns same fingerprint for: SELECT * FROM users WHERE id = 2
76+
77+
// Normalize query (replace literals with $N placeholders)
78+
$normalized = pg_query_normalize("SELECT * FROM users WHERE name = 'John'");
79+
// Returns: SELECT * FROM users WHERE name = $1
80+
81+
// Split multiple statements
82+
$statements = pg_query_split('SELECT 1; SELECT 2; SELECT 3');
83+
// Returns: ['SELECT 1', ' SELECT 2', ' SELECT 3']
84+
85+
// Parse PL/pgSQL function
86+
$plpgsql = pg_query_parse_plpgsql('
87+
CREATE FUNCTION add(a int, b int) RETURNS int AS $$
88+
BEGIN
89+
RETURN a + b;
90+
END;
91+
$$ LANGUAGE plpgsql;
92+
');
93+
94+
// Scan SQL into tokens (returns protobuf data)
95+
$protobuf = pg_query_scan('SELECT 1');
96+
```
97+
98+
## Functions Reference
99+
100+
| Function | Description | Returns |
101+
|----------|-------------|---------|
102+
| `pg_query_parse(string $sql)` | Parse SQL to JSON AST | `string` (JSON) |
103+
| `pg_query_fingerprint(string $sql)` | Generate query fingerprint | `string\|false` |
104+
| `pg_query_normalize(string $sql)` | Normalize query with placeholders | `string\|false` |
105+
| `pg_query_parse_plpgsql(string $sql)` | Parse PL/pgSQL function | `string` (JSON) |
106+
| `pg_query_split(string $sql)` | Split multiple statements | `array<string>` |
107+
| `pg_query_scan(string $sql)` | Scan SQL into tokens | `string` (protobuf) |
108+
109+
## Error Handling
110+
111+
The extension throws `RuntimeException` on parse errors:
112+
113+
```php
114+
<?php
115+
116+
try {
117+
$result = pg_query_parse('INVALID SQL SYNTAX');
118+
} catch (RuntimeException $e) {
119+
echo "Parse error: " . $e->getMessage();
120+
}
121+
```
122+
123+
## Development
124+
125+
### Build Commands
126+
127+
```bash
128+
# Build and run tests
129+
make test
130+
131+
# Build only
132+
make build
133+
134+
# Rebuild extension only (without rebuilding libpg_query)
135+
make rebuild
136+
137+
# Clean build artifacts
138+
make clean
139+
140+
# Remove everything including libpg_query
141+
make distclean
142+
```
143+
144+
### Modifying the Extension
145+
146+
When modifying the C source files:
147+
148+
```bash
149+
# Inside nix-shell with --arg with-pg-query-ext true
150+
cd src/extension/pg-query-ext
151+
make rebuild
152+
153+
# Test your changes
154+
make test
155+
```
156+
157+
## Architecture
158+
159+
The extension is built on top of [libpg_query](https://github.com/pganalyze/libpg_query), which extracts PostgreSQL's query parser into a standalone library. This means you get the exact same SQL parsing behavior as PostgreSQL itself.
160+
161+
Key implementation details:
162+
- **Static linking**: libpg_query.a is statically linked into the extension
163+
- **Build dependency**: Requires `protobuf-c` library for compilation (libpg_query uses protobuf internally)
164+
- **Auto-download**: The build system automatically downloads the correct libpg_query version
165+
166+
## See Also
167+
168+
- [pg-query library](/documentation/components/libs/pg-query.md) - Higher-level PHP wrapper with strongly-typed AST nodes
169+
- [libpg_query](https://github.com/pganalyze/libpg_query) - The underlying C library
170+
- [Nix Development Environment](/documentation/contributing/nix.md) - Using nix-shell for development
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# PG Query
2+
3+
- [⬅️️ Back](/documentation/introduction.md)
4+
5+
PostgreSQL Query Parser library provides strongly-typed AST (Abstract Syntax Tree) parsing for PostgreSQL SQL queries using the [libpg_query](https://github.com/pganalyze/libpg_query) library through a PHP extension.
6+
7+
This library wraps the low-level extension functions and provides:
8+
- Strongly-typed AST nodes generated from protobuf definitions
9+
- A `Parser` class for object-oriented access
10+
- DSL helper functions for convenient usage
11+
12+
## Requirements
13+
14+
This library requires the `pg_query` PHP extension. See [pg-query-ext documentation](/documentation/components/extensions/pg-query-ext.md) for installation instructions.
15+
16+
## Installation
17+
18+
```
19+
composer require flow-php/pg-query:~--FLOW_PHP_VERSION--
20+
```
21+
22+
## Usage
23+
24+
### Using the Parser Class
25+
26+
```php
27+
<?php
28+
29+
use Flow\PgQuery\Parser;
30+
31+
$parser = new Parser();
32+
33+
// Parse SQL into AST
34+
$result = $parser->parse('SELECT id, name FROM users WHERE active = true');
35+
36+
// Access the AST
37+
foreach ($result->getStmts() as $stmt) {
38+
$node = $stmt->getStmt();
39+
$selectStmt = $node->getSelectStmt();
40+
// Work with strongly-typed AST nodes...
41+
}
42+
```
43+
44+
### Using DSL Functions
45+
46+
```php
47+
<?php
48+
49+
use function Flow\PgQuery\DSL\pg_parse;
50+
use function Flow\PgQuery\DSL\pg_parser;
51+
use function Flow\PgQuery\DSL\pg_fingerprint;
52+
use function Flow\PgQuery\DSL\pg_normalize;
53+
use function Flow\PgQuery\DSL\pg_split;
54+
55+
// Parse SQL
56+
$result = pg_parse('SELECT * FROM users');
57+
58+
// Get a reusable parser instance
59+
$parser = pg_parser();
60+
61+
// Generate fingerprint
62+
$fingerprint = pg_fingerprint('SELECT id FROM users WHERE id = 1');
63+
64+
// Normalize query
65+
$normalized = pg_normalize('SELECT * FROM users WHERE id = 1');
66+
67+
// Split multiple statements
68+
$statements = pg_split('SELECT 1; SELECT 2;');
69+
```
70+
71+
## Features
72+
73+
### Query Parsing
74+
75+
Parse PostgreSQL SQL into a strongly-typed AST:
76+
77+
```php
78+
<?php
79+
80+
use Flow\PgQuery\Parser;
81+
82+
$parser = new Parser();
83+
$result = $parser->parse('SELECT id, name FROM users WHERE active = true ORDER BY name');
84+
85+
foreach ($result->getStmts() as $stmt) {
86+
$selectStmt = $stmt->getStmt()->getSelectStmt();
87+
88+
// Access FROM clause
89+
foreach ($selectStmt->getFromClause() as $fromItem) {
90+
$rangeVar = $fromItem->getRangeVar();
91+
echo "Table: " . $rangeVar->getRelname() . "\n";
92+
}
93+
94+
// Access target list (SELECT columns)
95+
foreach ($selectStmt->getTargetList() as $target) {
96+
$columnRef = $target->getResTarget()->getVal()->getColumnRef();
97+
// Process column references...
98+
}
99+
}
100+
```
101+
102+
### Query Fingerprinting
103+
104+
Generate unique fingerprints for structurally equivalent queries. This is useful for grouping similar queries regardless of their literal values:
105+
106+
```php
107+
<?php
108+
109+
use Flow\PgQuery\Parser;
110+
111+
$parser = new Parser();
112+
113+
// These queries produce the same fingerprint
114+
$fp1 = $parser->fingerprint('SELECT * FROM users WHERE id = 1');
115+
$fp2 = $parser->fingerprint('SELECT * FROM users WHERE id = 999');
116+
117+
var_dump($fp1 === $fp2); // true
118+
```
119+
120+
### Query Normalization
121+
122+
Replace literal values with parameter placeholders:
123+
124+
```php
125+
<?php
126+
127+
use Flow\PgQuery\Parser;
128+
129+
$parser = new Parser();
130+
131+
$normalized = $parser->normalize("SELECT * FROM users WHERE name = 'John' AND age = 25");
132+
// Returns: SELECT * FROM users WHERE name = $1 AND age = $2
133+
```
134+
135+
### Statement Splitting
136+
137+
Split a string containing multiple SQL statements:
138+
139+
```php
140+
<?php
141+
142+
use Flow\PgQuery\Parser;
143+
144+
$parser = new Parser();
145+
146+
$statements = $parser->split('SELECT 1; SELECT 2; SELECT 3');
147+
// Returns: ['SELECT 1', ' SELECT 2', ' SELECT 3']
148+
```
149+
150+
## API Reference
151+
152+
### Parser Class
153+
154+
| Method | Description | Returns |
155+
|--------|-------------|---------|
156+
| `parse(string $sql)` | Parse SQL into AST | `ParseResult` |
157+
| `fingerprint(string $sql)` | Generate query fingerprint | `?string` |
158+
| `normalize(string $sql)` | Normalize query with placeholders | `?string` |
159+
| `split(string $sql)` | Split multiple statements | `array<string>` |
160+
161+
### DSL Functions
162+
163+
| Function | Description | Returns |
164+
|----------|-------------|---------|
165+
| `pg_parser()` | Create a new Parser instance | `Parser` |
166+
| `pg_parse(string $sql)` | Parse SQL into AST | `ParseResult` |
167+
| `pg_fingerprint(string $sql)` | Generate query fingerprint | `?string` |
168+
| `pg_normalize(string $sql)` | Normalize query | `?string` |
169+
| `pg_split(string $sql)` | Split statements | `array<string>` |
170+
171+
## AST Node Types
172+
173+
The library includes 343 strongly-typed AST node classes generated from PostgreSQL's protobuf definitions. All classes are in the `Flow\PgQuery\Protobuf\AST` namespace.
174+
175+
Common node types include:
176+
- `SelectStmt` - SELECT statement
177+
- `InsertStmt` - INSERT statement
178+
- `UpdateStmt` - UPDATE statement
179+
- `DeleteStmt` - DELETE statement
180+
- `ColumnRef` - Column reference
181+
- `A_Expr` - Expression node
182+
- `FuncCall` - Function call
183+
- `JoinExpr` - JOIN expression
184+
- `RangeVar` - Table/view reference
185+
186+
## Exception Handling
187+
188+
```php
189+
<?php
190+
191+
use Flow\PgQuery\Parser;
192+
use Flow\PgQuery\Exception\ParserException;
193+
use Flow\PgQuery\Exception\ExtensionNotLoadedException;
194+
195+
try {
196+
$parser = new Parser();
197+
} catch (ExtensionNotLoadedException $e) {
198+
// pg_query extension is not loaded
199+
}
200+
201+
try {
202+
$result = $parser->parse('INVALID SQL SYNTAX HERE');
203+
} catch (ParserException $e) {
204+
echo "Parse error: " . $e->getMessage();
205+
}
206+
```
207+
208+
## Performance
209+
210+
For optimal protobuf parsing performance, install the `ext-protobuf` PHP extension:
211+
212+
```bash
213+
pecl install protobuf
214+
```
215+
216+
The library will work without it using the pure PHP implementation from `google/protobuf`, but the native extension provides significantly better performance for AST deserialization.

0 commit comments

Comments
 (0)