Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 121 additions & 115 deletions documentation/components/libs/pg-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,6 @@

PostgreSQL Query Parser library provides strongly-typed AST (Abstract Syntax Tree) parsing for PostgreSQL SQL queries using the [libpg_query](https://github.com/pganalyze/libpg_query) library through a PHP extension.

This library wraps the low-level extension functions and provides:
- Strongly-typed AST nodes generated from protobuf definitions
- A `Parser` class for object-oriented access
- DSL helper functions for convenient usage

## Requirements

This library requires the `pg_query` PHP extension. See [pg-query-ext documentation](/documentation/components/extensions/pg-query-ext.md) for installation instructions.
Expand All @@ -19,178 +14,189 @@ This library requires the `pg_query` PHP extension. See [pg-query-ext documentat
composer require flow-php/pg-query:~--FLOW_PHP_VERSION--
```

## Usage

### Using the Parser Class
## Quick Start

```php
<?php

use Flow\PgQuery\Parser;
use function Flow\PgQuery\DSL\pg_parse;

$parser = new Parser();
$query = pg_parse('SELECT u.id, u.name FROM users u JOIN orders o ON u.id = o.user_id');

// Parse SQL into AST
$result = $parser->parse('SELECT id, name FROM users WHERE active = true');
// Get all tables
foreach ($query->tables() as $table) {
echo $table->name(); // 'users', 'orders'
echo $table->alias(); // 'u', 'o'
}

// Access the AST
foreach ($result->getStmts() as $stmt) {
$node = $stmt->getStmt();
$selectStmt = $node->getSelectStmt();
// Work with strongly-typed AST nodes...
// Get all columns
foreach ($query->columns() as $column) {
echo $column->name(); // 'id', 'name', 'id', 'user_id'
echo $column->table(); // 'u', 'u', 'u', 'o'
}

// Get columns for specific table
$userColumns = $query->columns('u');

// Get all function calls
foreach ($query->functions() as $func) {
echo $func->name(); // function name
echo $func->schema(); // schema if qualified (e.g., 'pg_catalog')
}
```

### Using DSL Functions
## Parser Class

```php
<?php

use function Flow\PgQuery\DSL\pg_parse;
use function Flow\PgQuery\DSL\pg_parser;
use function Flow\PgQuery\DSL\pg_fingerprint;
use function Flow\PgQuery\DSL\pg_normalize;
use function Flow\PgQuery\DSL\pg_split;
use Flow\PgQuery\Parser;

// Parse SQL
$result = pg_parse('SELECT * FROM users');
$parser = new Parser();

// Get a reusable parser instance
$parser = pg_parser();
// Parse SQL into ParsedQuery
$query = $parser->parse('SELECT * FROM users WHERE id = 1');

// Generate fingerprint
$fingerprint = pg_fingerprint('SELECT id FROM users WHERE id = 1');
// Generate fingerprint (same for structurally equivalent queries)
$fingerprint = $parser->fingerprint('SELECT * FROM users WHERE id = 1');

// Normalize query
$normalized = pg_normalize('SELECT * FROM users WHERE id = 1');
// Normalize query (replace literals with positional parameters)
$normalized = $parser->normalize("SELECT * FROM users WHERE name = 'John'");
// Returns: SELECT * FROM users WHERE name = $1

// Normalize also handles Doctrine-style named parameters
$normalized = $parser->normalize('SELECT * FROM users WHERE id = :id');
// Returns: SELECT * FROM users WHERE id = $1

// Split multiple statements
$statements = $parser->split('SELECT 1; SELECT 2;');
// Returns: ['SELECT 1', ' SELECT 2']
```

## DSL Functions

```php
<?php

use function Flow\PgQuery\DSL\{pg_parse, pg_parser, pg_fingerprint, pg_normalize, pg_split};

$query = pg_parse('SELECT * FROM users');
$parser = pg_parser();
$fingerprint = pg_fingerprint('SELECT * FROM users WHERE id = 1');
$normalized = pg_normalize('SELECT * FROM users WHERE id = 1');
$statements = pg_split('SELECT 1; SELECT 2;');
```

## Features
## ParsedQuery Methods

| Method | Description | Returns |
|--------|-------------|---------|
| `tables()` | Get all tables referenced in the query | `array<Table>` |
| `columns(?string $tableName)` | Get columns, optionally filtered by table/alias | `array<Column>` |
| `functions()` | Get all function calls | `array<FunctionCall>` |
| `traverse(NodeVisitor ...$visitors)` | Traverse AST with custom visitors | `void` |
| `raw()` | Access underlying protobuf ParseResult | `ParseResult` |

### Query Parsing
## Custom AST Traversal

Parse PostgreSQL SQL into a strongly-typed AST:
For advanced use cases, you can traverse the AST with custom visitors:

```php
<?php

use Flow\PgQuery\Parser;
use Flow\PgQuery\AST\NodeVisitor;
use Flow\PgQuery\Protobuf\AST\ColumnRef;

$parser = new Parser();
$result = $parser->parse('SELECT id, name FROM users WHERE active = true ORDER BY name');
use function Flow\PgQuery\DSL\pg_parse;

foreach ($result->getStmts() as $stmt) {
$selectStmt = $stmt->getStmt()->getSelectStmt();
class ColumnCounter implements NodeVisitor
{
public int $count = 0;

// Access FROM clause
foreach ($selectStmt->getFromClause() as $fromItem) {
$rangeVar = $fromItem->getRangeVar();
echo "Table: " . $rangeVar->getRelname() . "\n";
public static function nodeClass(): string
{
return ColumnRef::class;
}

public function enter(object $node): ?int
{
$this->count++;
return null;
}

// Access target list (SELECT columns)
foreach ($selectStmt->getTargetList() as $target) {
$columnRef = $target->getResTarget()->getVal()->getColumnRef();
// Process column references...
public function leave(object $node): ?int
{
return null;
}
}
```

### Query Fingerprinting
$query = pg_parse('SELECT id, name, email FROM users');

Generate unique fingerprints for structurally equivalent queries. This is useful for grouping similar queries regardless of their literal values:
$counter = new ColumnCounter();
$query->traverse($counter);

```php
<?php
echo $counter->count; // 3
```

use Flow\PgQuery\Parser;
### NodeVisitor Interface

$parser = new Parser();
```php
interface NodeVisitor
{
public const DONT_TRAVERSE_CHILDREN = 1;
public const STOP_TRAVERSAL = 2;

// These queries produce the same fingerprint
$fp1 = $parser->fingerprint('SELECT * FROM users WHERE id = 1');
$fp2 = $parser->fingerprint('SELECT * FROM users WHERE id = 999');
/** @return class-string */
public static function nodeClass(): string;

var_dump($fp1 === $fp2); // true
public function enter(object $node): ?int;
public function leave(object $node): ?int;
}
```

### Query Normalization

Replace literal values with parameter placeholders:

```php
<?php

use Flow\PgQuery\Parser;
Visitors declare which node type they handle via `nodeClass()`. Return values:
- `null` - continue traversal
- `DONT_TRAVERSE_CHILDREN` - skip children (from `enter()` only)
- `STOP_TRAVERSAL` - stop entire traversal

$parser = new Parser();
### Built-in Visitors

$normalized = $parser->normalize("SELECT * FROM users WHERE name = 'John' AND age = 25");
// Returns: SELECT * FROM users WHERE name = $1 AND age = $2
```
- `ColumnRefCollector` - collects all `ColumnRef` nodes
- `FuncCallCollector` - collects all `FuncCall` nodes
- `RangeVarCollector` - collects all `RangeVar` nodes

### Statement Splitting
## Raw AST Access

Split a string containing multiple SQL statements:
For full control, access the protobuf AST directly:

```php
<?php

use Flow\PgQuery\Parser;
use function Flow\PgQuery\DSL\pg_parse;

$parser = new Parser();
$query = pg_parse('SELECT id FROM users WHERE active = true');

$statements = $parser->split('SELECT 1; SELECT 2; SELECT 3');
// Returns: ['SELECT 1', ' SELECT 2', ' SELECT 3']
```
foreach ($query->raw()->getStmts() as $stmt) {
$select = $stmt->getStmt()->getSelectStmt();

## API Reference

### Parser Class
// Access FROM clause
foreach ($select->getFromClause() as $from) {
echo $from->getRangeVar()->getRelname();
}

| Method | Description | Returns |
|--------|-------------|---------|
| `parse(string $sql)` | Parse SQL into AST | `ParseResult` |
| `fingerprint(string $sql)` | Generate query fingerprint | `?string` |
| `normalize(string $sql)` | Normalize query with placeholders | `?string` |
| `split(string $sql)` | Split multiple statements | `array<string>` |

### DSL Functions

| Function | Description | Returns |
|----------|-------------|---------|
| `pg_parser()` | Create a new Parser instance | `Parser` |
| `pg_parse(string $sql)` | Parse SQL into AST | `ParseResult` |
| `pg_fingerprint(string $sql)` | Generate query fingerprint | `?string` |
| `pg_normalize(string $sql)` | Normalize query | `?string` |
| `pg_split(string $sql)` | Split statements | `array<string>` |

## AST Node Types

The library includes 343 strongly-typed AST node classes generated from PostgreSQL's protobuf definitions. All classes are in the `Flow\PgQuery\Protobuf\AST` namespace.

Common node types include:
- `SelectStmt` - SELECT statement
- `InsertStmt` - INSERT statement
- `UpdateStmt` - UPDATE statement
- `DeleteStmt` - DELETE statement
- `ColumnRef` - Column reference
- `A_Expr` - Expression node
- `FuncCall` - Function call
- `JoinExpr` - JOIN expression
- `RangeVar` - Table/view reference
// Access WHERE clause
$where = $select->getWhereClause();
// ...
}
```

## Exception Handling

```php
<?php

use Flow\PgQuery\Parser;
use Flow\PgQuery\Exception\ParserException;
use Flow\PgQuery\Exception\ExtensionNotLoadedException;
use Flow\PgQuery\Exception\{ParserException, ExtensionNotLoadedException};

try {
$parser = new Parser();
Expand All @@ -199,7 +205,7 @@ try {
}

try {
$result = $parser->parse('INVALID SQL SYNTAX HERE');
$parser->parse('INVALID SQL');
} catch (ParserException $e) {
echo "Parse error: " . $e->getMessage();
}
Expand All @@ -213,4 +219,4 @@ For optimal protobuf parsing performance, install the `ext-protobuf` PHP extensi
pecl install protobuf
```

The library will work without it using the pure PHP implementation from `google/protobuf`, but the native extension provides significantly better performance for AST deserialization.
The library works without it using the pure PHP implementation from `google/protobuf`, but the native extension provides significantly better performance.
60 changes: 60 additions & 0 deletions src/lib/pg-query/src/Flow/PgQuery/AST/NodeVisitor.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
<?php

declare(strict_types=1);

namespace Flow\PgQuery\AST;

/**
* Interface for AST node visitors.
*
* Visitors are registered for specific node types and only receive nodes of that type.
* Use the static nodeClass() method to declare which node type this visitor handles.
*/
interface NodeVisitor
{
/**
* Don't traverse children of the current node.
*/
public const DONT_TRAVERSE_CHILDREN = 1;

/**
* Remove the node from its parent array.
*/
public const REMOVE_NODE = 3;

/**
* Stop the entire traversal.
*/
public const STOP_TRAVERSAL = 2;

/**
* Returns the fully qualified class name of the node type this visitor handles.
*
* @return class-string The node class this visitor is registered for
*/
public static function nodeClass() : string;

/**
* Called when entering a node of the registered type.
*
* @param object $node The node instance (type depends on nodeClass())
*
* @return null|int Return value determines traversal behavior:
* - null: Continue traversal
* - DONT_TRAVERSE_CHILDREN: Don't traverse children
* - STOP_TRAVERSAL: Stop entire traversal
*/
public function enter(object $node) : ?int;

/**
* Called when leaving a node of the registered type.
*
* @param object $node The node instance (type depends on nodeClass())
*
* @return null|int Return value determines traversal behavior:
* - null: Continue traversal
* - REMOVE_NODE: Remove node from parent
* - STOP_TRAVERSAL: Stop entire traversal
*/
public function leave(object $node) : ?int;
}
Loading