-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hi team!
Happy New Year, hope everyone is having a good holiday break!
I had some spare time in the past few days and decided to see what would happen if I removed Protobuf (de)serialization and replaced it with raw C/Rust conversions. Basically, I had to get the AST nodes from the Postgres source code, generate Rust structs (with bindgen) and write conversion functions from C to Rust, and back (to support deparse).
The results are really great:
- 5x performance improvement for parsing queries (string to AST)
- 9x improvement for deparsing (AST to string)
Here's raw numbers from our benchmark:
Parse
============================================================
parse_raw vs parse Benchmark
============================================================
Query: 2725 chars (CTEs + JOINs + subqueries + window functions)
┌─────────────────────────────────────────────────────────┐
│ RESULTS │
├─────────────────────────────────────────────────────────┤
│ parse_raw (direct C struct reading): │
│ Iterations: 6800 │
│ Total time: 2.03s │
│ Per iteration: 297.88 μs │
│ Throughput: 3357 queries/sec │
├─────────────────────────────────────────────────────────┤
│ parse (protobuf serialization): │
│ Iterations: 1300 │
│ Total time: 2.11s │
│ Per iteration: 1623.70 μs │
│ Throughput: 616 queries/sec │
├─────────────────────────────────────────────────────────┤
│ COMPARISON │
│ Speedup: 5.45x faster │
│ Time saved: 1325.82 μs per parse │
│ Extra queries: 2741 more queries/sec │
└─────────────────────────────────────────────────────────┘
Deparse
============================================================
deparse_raw vs deparse Benchmark
============================================================
Query: 2725 chars (CTEs + JOINs + subqueries + window functions)
┌─────────────────────────────────────────────────────────┐
│ RESULTS │
├─────────────────────────────────────────────────────────┤
│ deparse_raw (direct C struct building): │
│ Iterations: 14700 │
│ Total time: 2.01s │
│ Per iteration: 136.63 μs │
│ Throughput: 7319 queries/sec │
├─────────────────────────────────────────────────────────┤
│ deparse (protobuf serialization): │
│ Iterations: 1600 │
│ Total time: 2.11s │
│ Per iteration: 1317.66 μs │
│ Throughput: 759 queries/sec │
├─────────────────────────────────────────────────────────┤
│ COMPARISON │
│ Speedup: 9.64x faster │
│ Time saved: 1181.03 μs per deparse │
│ Extra queries: 6560 more queries/sec │
└─────────────────────────────────────────────────────────┘
The code is in our fork here: https://github.com/pgdogdev/pg_query.rs. We also forked libpg_query to add the necessary C wrappers: https://github.com/pgdogdev/libpg_query/.
So, what's the catch? Non-exhaustive list:
- The node conversions are all AI-generated (Claude). This is often a deal breaker for teams, so just letting you know early on.
- The conversions are fully recursive - it can blow the stack on large queries. We set it pretty high in PgDog to avoid this issue (32MiB right now, which is pretty high imo)
- I blew the stack a few times because the generated code incorrectly interpreted some nodes which caused infinite recursion. Our CI caught those but we would need to basically come up with queries that use any and all Postgres SQL features - not terribly hard, but tedious and easy to make a mistake.
I also tried implementing a heap-based iterative approach, but that ended up being slower that the protobuf version (surprisingly, or not perhaps?), so I left the recursive one in place for now. It's easier to read too, tbh. We can probably add a recursion limit, just like prost does, to prevent overflows.
I know you guys maintain a large number of bindings for libpg_query and protobuf works great for making the interface between the C lib and the bindings work. Maintaining raw bindings between all languages could be difficult. Although, these days, you could probably AI-generate them in a few days/weeks, but there is a pretty big maintenance burden there for sure. Let me know if this is something that might be interesting and if yes, I'm happy to work with you to merge our fork into your code!
I'm going to be testing this with PgDog in the coming weeks and months and let you know how it goes.
Cheers!