Added ANTLR parse tree persistent caching for procedures#4547
Added ANTLR parse tree persistent caching for procedures#4547manisha-deshpande wants to merge 12 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
Conversation
0e2dd88 to
751d3e8
Compare
| create_date SYS.DATETIME NOT NULL, | ||
| modify_date SYS.DATETIME NOT NULL, | ||
| definition sys.NTEXT DEFAULT NULL, | ||
| antlr_parse_tree JSONB DEFAULT NULL, -- JSONB serialized ANTLR parse tree for caching |
There was a problem hiding this comment.
babelfish_function_ext regression test fails.
Column can be queried from postgres side, not babelfish side due to unsupported datatype.
Should the column type be TEXT instead? Would that affect storage space?
--- /home/runner/work/babelfish_extensions/babelfish_extensions/test/JDBC/./expected/babelfish_function_ext-vu-cleanup.out 2026-02-06 18:16:42.761144533 +0000
+++ /home/runner/work/babelfish_extensions/babelfish_extensions/test/JDBC/./output/babelfish_function_ext-vu-cleanup.out 2026-02-06 18:42:00.808212640 +0000
@@ -52,7 +52,7 @@
-- babelfish_function_ext entry should have been removed after dropping all these functions/procedure
SELECT * FROM sys.babelfish_function_ext WHERE funcname LIKE 'babel_2877_vu_prepare%';
GO
-~~START~~
-varchar#!#varchar#!#nvarchar#!#text#!#text#!#bigint#!#bigint#!#datetime#!#datetime#!#ntext
-~~END~~
+~~ERROR (Code: 33557097)~~
+
+~~ERROR (Message: data type jsonb is not supported yet)~~
There was a problem hiding this comment.
You can ignore this error for now. We can later add a TDS sender function for JSONB (which just sends it as JSON)
Should the column type be TEXT instead? Would that affect storage space?
Yes, JSONB will allow fast lookups compared to JSON/TEXT which will required deserialization of its own. (Which will become a problem for bigger procedures).
751d3e8 to
6bb6abf
Compare
| * | ||
| * This header provides the interface for serializing and deserializing | ||
| * ANTLR PLtsql parse trees to/from JSONB format. The serialized data is | ||
| * stored in the cross-session cache (babelfish_func_ext) to enable faster |
There was a problem hiding this comment.
it's a catalog rather than a cache - even though the catalog will be cached
|
|
||
| /* Read the value for this key */ | ||
| tok = JsonbIteratorNext(&ctx.it, &v, false); | ||
|
|
robverschoor
left a comment
There was a problem hiding this comment.
addded some comments
6316c43 to
ffb844e
Compare
| DROP TABLE IF EXISTS sys.babelfish_function_ext; | ||
|
|
||
| CREATE TABLE sys.babelfish_function_ext ( |
There was a problem hiding this comment.
Should be ALTER TABLE ADD COLUMN
| antlr_parse_tree_text TEXT DEFAULT NULL, -- Native PG nodeToString() serialized parse tree | ||
| antlr_parse_tree_datums TEXT DEFAULT NULL, -- Native PG nodeToString() serialized datums array | ||
| antlr_parse_tree_modify_date SYS.DATETIME DEFAULT NULL, | ||
| antlr_parse_tree_bbf_version TEXT DEFAULT NULL, |
There was a problem hiding this comment.
Two colums should be sufficent, no ?
- pg_node_tree - the tree
- bbf_version_for_tree
Also why will storing it in binary have performnace improvments ? Wouldn't it have the reverse effect
There was a problem hiding this comment.
I've let it remain as two separate columns for now as the split by delimiter function call on large concatenated strings (for large procedures) might be expensive (unnecessary overhead)
contrib/babelfishpg_tsql/src/guc.c
Outdated
|
|
||
| bool pltsql_enable_create_alter_view_from_pg = false; | ||
| bool pltsql_enable_alter_owner_from_pg = false; | ||
| bool pltsql_enable_procedure_parse_cache = false; |
There was a problem hiding this comment.
Lets use the general term routine applies to both procedures and functions.
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…ble and type Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…E, TRY Statements Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…d retrieval Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Add cross-session ANTLR parse tree caching for T-SQL stored procedures. Serialized parse trees and datums are stored in babelfish_function_ext using nodeToString/stringToNode. On procedure execution, cached results are restored to skip ANTLR re-parsing. Cache reads validate the stored bbf_version and modify_date before deserializing, skipping stale entries from different Babelfish versions or procedures modified with the GUC disabled. Changes: - Add antlr_parse_tree_text, antlr_parse_tree_datums, antlr_parse_tree_modify_date, and antlr_parse_tree_bbf_version columns to sys.babelfish_function_ext - Store serialized parse tree and version in pltsql_store_func_default_positions - Restore and validate cached parse tree in new function pltsql_restore_func_parse_result invoked prior to ANTLR parse Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Add bbf_version validation, exec-time cache repopulation, and rename/alter/dependency invalidation logic and tests Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Move PLtsql outfuncs/readfuncs code generation entirely to the extension, eliminating the need for PLtsql-specific headers in the engine's gen_node_support.pl input files. Key changes: - gen_pltsql_node_support.pl now generates pltsql_nodetags.h with extension-owned T_PLtsql_* NodeTag values (offset from 1000 to avoid collision with engine's NodeTag enum). Includes ABI stability check that fails the build if node types are added without updating $last_nodetag/$last_nodetag_no. - Wrapper files pltsql_outfuncs.c and pltsql_readfuncs.c mirror the engine's pattern: #include the generated static functions and switch fragments, expose public pltsql_outNode() and pltsql_parseNodeString() dispatch functions. - pltsql_serialize_macros.h provides WRITE_*/READ_* macros replicated from engine internals (not exposed in any PG header). - pl_handler.c registers outNode_hook and parseNodeString_hook in _PG_init() so the engine's outNode()/parseNodeString() delegate to extension code for PLtsql node types. - pltsql.h includes generated pltsql_nodetags.h for T_PLtsql_* defines. - Makefile updated: compiles wrapper .o files (not gen .o directly), with proper dependency rules for generated files. Files changed: src/pltsql_serialize/gen_pltsql_node_support.pl - nodetags generation + ABI check src/pltsql_serialize/pltsql_outfuncs.c - new wrapper src/pltsql_serialize/pltsql_readfuncs.c - new wrapper src/pltsql_serialize/pltsql_serialize_macros.h - shared macros src/pltsql_serialize/pltsql_node_stubs.c - custom read/write nodes src/pltsql.h - include pltsql_nodetags.h src/pl_handler.c - register hooks Makefile - build rules Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
84d3a71 to
f7a3051
Compare
Description
[TBD]
Issues Resolved
BABEL-6037
Test Scenarios Covered
[TBD]
Use case based -
Boundary conditions -
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.