Skip to content

Commit 82aa3b9

Browse files
Add C.md documentation for libpg_query functions and WASM wrapper improvements
- Document all available libpg_query C functions beyond current 4 basic wrappers - Analyze current proto.js dependency in wasm/index.js deparse function - Provide detailed plan to eliminate 5.4MB proto.js by using pg_query_parse_protobuf() - Suggest moving all memory management from JavaScript to C wrapper - Document additional useful functions: normalize, scan, split statements - Propose enhanced error handling with detailed error information - Include implementation examples and migration strategy Co-Authored-By: Dan Lynch <[email protected]>
1 parent 668ef75 commit 82aa3b9

File tree

1 file changed

+290
-0
lines changed

1 file changed

+290
-0
lines changed

C.md

Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
# libpg_query C Functions Analysis and WASM Wrapper Improvements
2+
3+
## Overview
4+
5+
This document analyzes the available C functions in libpg_query and provides suggestions for improving the current WASM wrapper implementation to reduce JavaScript dependencies and improve memory management.
6+
7+
## Current WASM Wrapper Analysis
8+
9+
### Current Implementation (`src/wasm_wrapper.c`)
10+
11+
The current wrapper only exposes 5 basic functions:
12+
- `wasm_parse_query()` - wraps `pg_query_parse()`
13+
- `wasm_deparse_protobuf()` - wraps `pg_query_deparse_protobuf()`
14+
- `wasm_parse_plpgsql()` - wraps `pg_query_parse_plpgsql()`
15+
- `wasm_fingerprint()` - wraps `pg_query_fingerprint()`
16+
- `wasm_free_string()` - memory cleanup
17+
18+
### Current JavaScript Dependencies (`wasm/index.js`)
19+
20+
The current implementation has these issues:
21+
1. **proto.js dependency**: The `deparse()` function uses `pg_query.ParseResult.fromObject()` and `pg_query.ParseResult.encode()` from proto.js (5.4MB file)
22+
2. **Manual memory management**: JavaScript code manually manages WASM memory with `_malloc()`, `_free()`, and pointer operations
23+
3. **Limited error handling**: Basic string-based error detection
24+
25+
## Available libpg_query C Functions
26+
27+
### Core Parsing Functions
28+
```c
29+
// Basic parsing (returns JSON strings)
30+
PgQueryParseResult pg_query_parse(const char* input);
31+
PgQueryParseResult pg_query_parse_opts(const char* input, int parser_options);
32+
33+
// Protobuf parsing (returns binary protobuf data)
34+
PgQueryProtobufParseResult pg_query_parse_protobuf(const char* input);
35+
PgQueryProtobufParseResult pg_query_parse_protobuf_opts(const char* input, int parser_options);
36+
37+
// PL/pgSQL parsing
38+
PgQueryPlpgsqlParseResult pg_query_parse_plpgsql(const char* input);
39+
```
40+
41+
### Query Processing Functions
42+
```c
43+
// Normalization
44+
PgQueryNormalizeResult pg_query_normalize(const char* input);
45+
PgQueryNormalizeResult pg_query_normalize_utility(const char* input);
46+
47+
// Scanning/Tokenization
48+
PgQueryScanResult pg_query_scan(const char* input);
49+
50+
// Statement splitting
51+
PgQuerySplitResult pg_query_split_with_scanner(const char *input);
52+
PgQuerySplitResult pg_query_split_with_parser(const char *input);
53+
54+
// Fingerprinting
55+
PgQueryFingerprintResult pg_query_fingerprint(const char* input);
56+
PgQueryFingerprintResult pg_query_fingerprint_opts(const char* input, int parser_options);
57+
58+
// Deparsing
59+
PgQueryDeparseResult pg_query_deparse_protobuf(PgQueryProtobuf parse_tree);
60+
```
61+
62+
### Memory Management Functions
63+
```c
64+
void pg_query_free_normalize_result(PgQueryNormalizeResult result);
65+
void pg_query_free_scan_result(PgQueryScanResult result);
66+
void pg_query_free_parse_result(PgQueryParseResult result);
67+
void pg_query_free_split_result(PgQuerySplitResult result);
68+
void pg_query_free_deparse_result(PgQueryDeparseResult result);
69+
void pg_query_free_protobuf_parse_result(PgQueryProtobufParseResult result);
70+
void pg_query_free_plpgsql_parse_result(PgQueryPlpgsqlParseResult result);
71+
void pg_query_free_fingerprint_result(PgQueryFingerprintResult result);
72+
void pg_query_exit(void);
73+
```
74+
75+
## Key Improvement Opportunities
76+
77+
### 1. Eliminate proto.js Dependency
78+
79+
**Current Problem**: The `deparse()` function in `wasm/index.js` uses proto.js to encode JavaScript objects to protobuf:
80+
```javascript
81+
const msg = pg_query.ParseResult.fromObject(parseTree);
82+
const data = pg_query.ParseResult.encode(msg).finish();
83+
```
84+
85+
**Solution**: Use `pg_query_parse_protobuf()` instead of `pg_query_parse()` to get protobuf data directly from C, eliminating the need for JavaScript protobuf encoding.
86+
87+
**Implementation**:
88+
```c
89+
// New wrapper function
90+
EMSCRIPTEN_KEEPALIVE
91+
char* wasm_parse_query_protobuf(const char* input) {
92+
PgQueryProtobufParseResult result = pg_query_parse_protobuf(input);
93+
94+
if (result.error) {
95+
char* error_msg = strdup(result.error->message);
96+
pg_query_free_protobuf_parse_result(result);
97+
return error_msg;
98+
}
99+
100+
// Return base64-encoded protobuf data or raw bytes
101+
char* protobuf_data = malloc(result.parse_tree.len);
102+
memcpy(protobuf_data, result.parse_tree.data, result.parse_tree.len);
103+
pg_query_free_protobuf_parse_result(result);
104+
return protobuf_data;
105+
}
106+
```
107+
108+
### 2. Improved Memory Management
109+
110+
**Current Problem**: JavaScript manually manages WASM memory with complex pointer operations.
111+
112+
**Solution**: Handle all memory management in C wrapper functions.
113+
114+
**Implementation**:
115+
```c
116+
// Unified result structure for better memory management
117+
typedef struct {
118+
char* data;
119+
size_t len;
120+
int is_error;
121+
} WasmResult;
122+
123+
EMSCRIPTEN_KEEPALIVE
124+
WasmResult* wasm_parse_query_managed(const char* input) {
125+
WasmResult* result = malloc(sizeof(WasmResult));
126+
PgQueryParseResult parse_result = pg_query_parse(input);
127+
128+
if (parse_result.error) {
129+
result->data = strdup(parse_result.error->message);
130+
result->len = strlen(result->data);
131+
result->is_error = 1;
132+
} else {
133+
result->data = strdup(parse_result.parse_tree);
134+
result->len = strlen(result->data);
135+
result->is_error = 0;
136+
}
137+
138+
pg_query_free_parse_result(parse_result);
139+
return result;
140+
}
141+
142+
EMSCRIPTEN_KEEPALIVE
143+
void wasm_free_result(WasmResult* result) {
144+
if (result) {
145+
free(result->data);
146+
free(result);
147+
}
148+
}
149+
```
150+
151+
### 3. Additional Useful Functions to Expose
152+
153+
**Query Normalization**:
154+
```c
155+
EMSCRIPTEN_KEEPALIVE
156+
char* wasm_normalize_query(const char* input) {
157+
PgQueryNormalizeResult result = pg_query_normalize(input);
158+
159+
if (result.error) {
160+
char* error_msg = strdup(result.error->message);
161+
pg_query_free_normalize_result(result);
162+
return error_msg;
163+
}
164+
165+
char* normalized = strdup(result.normalized_query);
166+
pg_query_free_normalize_result(result);
167+
return normalized;
168+
}
169+
```
170+
171+
**Query Scanning/Tokenization**:
172+
```c
173+
EMSCRIPTEN_KEEPALIVE
174+
char* wasm_scan_query(const char* input) {
175+
PgQueryScanResult result = pg_query_scan(input);
176+
177+
if (result.error) {
178+
char* error_msg = strdup(result.error->message);
179+
pg_query_free_scan_result(result);
180+
return error_msg;
181+
}
182+
183+
// Convert protobuf to JSON or return raw protobuf
184+
// Implementation depends on desired output format
185+
pg_query_free_scan_result(result);
186+
return NULL; // Placeholder
187+
}
188+
```
189+
190+
**Statement Splitting**:
191+
```c
192+
EMSCRIPTEN_KEEPALIVE
193+
char* wasm_split_statements(const char* input) {
194+
PgQuerySplitResult result = pg_query_split_with_parser(input);
195+
196+
if (result.error) {
197+
char* error_msg = strdup(result.error->message);
198+
pg_query_free_split_result(result);
199+
return error_msg;
200+
}
201+
202+
// Convert split results to JSON
203+
// Implementation needed
204+
pg_query_free_split_result(result);
205+
return NULL; // Placeholder
206+
}
207+
```
208+
209+
### 4. Enhanced Error Handling
210+
211+
**Current Problem**: Basic string-based error detection in JavaScript.
212+
213+
**Solution**: Structured error handling in C with detailed error information.
214+
215+
**Implementation**:
216+
```c
217+
typedef struct {
218+
int has_error;
219+
char* message;
220+
char* funcname;
221+
char* filename;
222+
int lineno;
223+
int cursorpos;
224+
char* context;
225+
char* data;
226+
size_t data_len;
227+
} WasmDetailedResult;
228+
229+
EMSCRIPTEN_KEEPALIVE
230+
WasmDetailedResult* wasm_parse_query_detailed(const char* input) {
231+
WasmDetailedResult* result = malloc(sizeof(WasmDetailedResult));
232+
memset(result, 0, sizeof(WasmDetailedResult));
233+
234+
PgQueryParseResult parse_result = pg_query_parse(input);
235+
236+
if (parse_result.error) {
237+
result->has_error = 1;
238+
result->message = strdup(parse_result.error->message);
239+
result->funcname = parse_result.error->funcname ? strdup(parse_result.error->funcname) : NULL;
240+
result->filename = parse_result.error->filename ? strdup(parse_result.error->filename) : NULL;
241+
result->lineno = parse_result.error->lineno;
242+
result->cursorpos = parse_result.error->cursorpos;
243+
result->context = parse_result.error->context ? strdup(parse_result.error->context) : NULL;
244+
} else {
245+
result->data = strdup(parse_result.parse_tree);
246+
result->data_len = strlen(result->data);
247+
}
248+
249+
pg_query_free_parse_result(parse_result);
250+
return result;
251+
}
252+
```
253+
254+
## Recommended Implementation Plan
255+
256+
### Phase 1: Eliminate proto.js Dependency
257+
1. Add `wasm_parse_query_protobuf()` function to get protobuf data directly from C
258+
2. Modify JavaScript `deparse()` function to use protobuf data from C instead of encoding in JS
259+
3. Remove proto.js import from `wasm/index.js`
260+
261+
### Phase 2: Improve Memory Management
262+
1. Implement unified result structures in C
263+
2. Move all memory allocation/deallocation to C wrapper functions
264+
3. Simplify JavaScript code to just call C functions and handle results
265+
266+
### Phase 3: Expand API Surface
267+
1. Add normalization, scanning, and splitting functions
268+
2. Implement enhanced error handling with detailed error information
269+
3. Add parser options support for advanced use cases
270+
271+
### Phase 4: Performance Optimizations
272+
1. Implement result caching in C for repeated operations
273+
2. Add batch processing functions for multiple queries
274+
3. Optimize memory usage patterns
275+
276+
## Benefits of Proposed Changes
277+
278+
1. **Reduced Bundle Size**: Eliminating 5.4MB proto.js dependency
279+
2. **Better Memory Management**: All memory operations handled in C, reducing leaks and complexity
280+
3. **Enhanced Functionality**: Access to full libpg_query feature set
281+
4. **Improved Error Handling**: Detailed error information with source locations
282+
5. **Better Performance**: Reduced JavaScript/WASM boundary crossings
283+
6. **Simplified JavaScript Code**: Less complex memory management and protobuf handling
284+
285+
## Compatibility Considerations
286+
287+
- Maintain backward compatibility for existing API functions
288+
- Add new functions as additional exports rather than replacing existing ones
289+
- Provide migration guide for users wanting to adopt new APIs
290+
- Consider versioning strategy for major API changes

0 commit comments

Comments
 (0)