Insert exec query tree modification (poc pr)#4581
Open
monk0707 wants to merge 38 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
Open
Insert exec query tree modification (poc pr)#4581monk0707 wants to merge 38 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
monk0707 wants to merge 38 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
Conversation
4cb263f to
4625bd2
Compare
This commit implements INSERT EXEC functionality using query rewriting: - Extract target table and column list from INSERT EXEC statement in tsqlIface.cpp - Add InsertExecRewriteContext structure to track rewrite state in pltsql.h - Implement query rewriting in pl_exec.c: - is_simple_select_query(): Detect SELECT statements eligible for rewriting - should_rewrite_for_insert_exec(): Check if rewrite context is active - exec_rewritten_insert_exec(): Transform SELECT to INSERT using CTE pattern The rewriting transforms: SELECT col1, col2 FROM source Into: WITH __insert_exec_cte AS (SELECT col1, col2 FROM source) INSERT INTO target_table (columns) SELECT * FROM __insert_exec_cte Tested with: - Nested procedures (3 levels deep) - JOINs, dynamic SQL, WHILE loops - TRY-CATCH blocks, aggregations, UNION - CASE expressions, TOP/ORDER BY - Cross-database INSERT EXEC
Wrap INSERT EXEC procedure execution in a savepoint (BeginInternalSubTransaction) to ensure atomic behavior. If the procedure fails at any point (constraint violation, error in procedure, etc.), all inserted rows are rolled back. This matches SQL Server's behavior where INSERT EXEC is atomic - either all rows are inserted or none are. Key implementation details: - Use BeginInternalSubTransaction/ReleaseCurrentSubTransaction for success path - Use RollbackAndReleaseCurrentSubTransaction for error path - Save error data BEFORE rollback to avoid 'Invalid cursor state' errors - Re-raise error with ereport(ERROR, ...) instead of PG_RE_THROW() - Properly restore memory context, resource owner, and eval_econtext
Add support for INSERT INTO t EXEC (@variable) syntax where the SQL string is stored in a variable and evaluated at runtime. Implementation approach: - Parser detects INSERT INTO ... EXEC (@var) pattern and creates PLtsql_stmt_exec_batch instead of PLtsql_stmt_execsql - The INSERT context (target table, column list) is stored in the PLtsql_stmt_exec_batch structure - At execution time, the INSERT EXEC rewrite context is set before executing the dynamic SQL, allowing SELECT statements to be rewritten as INSERT statements Changes: - pltsql-2.h: Add insert_exec, insert_exec_target, insert_exec_columns fields to PLtsql_stmt_exec_batch structure - pl_exec-2.c: Set/clear INSERT EXEC rewrite context in exec_stmt_exec_batch - tsqlIface.cpp: Detect INSERT EXEC with dynamic SQL in ANTLR parser and create appropriate statement type with INSERT context
Add support for INSERT INTO table EXEC sp_executesql syntax. Previously, sp_executesql was only handled as a standalone EXEC statement but not when used within INSERT EXEC context. Changes: - Detect sp_executesql in INSERT EXEC context in enterDml_statement - Create PLtsql_stmt_exec_batch with the SQL string argument as expression - Skip normal PLtsql_stmt_execsql processing in exitDml_statement for this case This allows queries like: INSERT INTO t2 EXEC sp_executesql N'SELECT * FROM t1' INSERT INTO t2 EXEC sp_executesql @sqlVariable
When executing INSERT INTO t EXEC @rc = P, the return_status (@rc =) caused the normal makeSQL() path to fail because PostgreSQL parser doesn't understand the @rc = syntax. This commit handles INSERT EXEC with return status the same way as EXEC (@variable) and sp_executesql - by detecting the pattern in enterDml_statement and creating a PLtsql_stmt_exec with insert_exec=true. Changes: - pltsql-2.h: Added insert_exec, insert_exec_target, and insert_exec_columns fields to PLtsql_stmt_exec struct - tsqlIface.cpp: Added detection for body->return_status in enterDml_statement to create PLtsql_stmt_exec with INSERT EXEC context; added skip condition in exitDml_statement - pl_exec-2.c: Added INSERT EXEC rewrite context setup in exec_stmt_exec before procedure execution, with cleanup in PG_FINALLY block
Changes: 1. Fixed SPI return code check for temp table creation - accept both SPI_OK_SELINTO and SPI_OK_UTILITY since CREATE TABLE AS SELECT returns SPI_OK_SELINTO 2. Fixed flush query for IDENTITY_INSERT - removed explicit OVERRIDING SYSTEM VALUE since T-SQL parser automatically adds it when IDENTITY_INSERT is ON for the target table 3. Fixed SPI return code check for flush INSERT - accept both SPI_OK_INSERT and SPI_OK_INSERT_RETURNING since T-SQL parser adds RETURNING clause when IDENTITY_INSERT is ON 4. Added handling for column_list with IDENTITY_INSERT ON - use type casts in subquery to strip IDENTITY property from temp table columns 5. Added insert_exec_identity_insert field to PLtsql_execstate to track IDENTITY_INSERT state during INSERT EXEC execution 6. Skip has_ident check in pl_handler.c for INSERT EXEC statements since values come from procedure, not explicit VALUES
- Add quoteIdentifierIfNeeded() helper in tsqlIface.cpp to quote identifiers containing special characters (spaces, dashes, etc.) with square brackets - Update target_table and column_list building to use quoteIdentifierIfNeeded() - Add strip_tsql_brackets() helper in pl_exec.c to strip T-SQL brackets - Add strip_pg_quotes() helper in pl_exec.c to strip PostgreSQL double quotes - Update exec_create_insert_exec_temp_table() to properly handle table names with spaces by stripping brackets and using quote_identifier() for PostgreSQL - Fix IDENTITY_INSERT OID lookup to strip double quotes from table name before calling get_relname_relid() This fixes the 'invalid name syntax' error when using table names like [t bracket test] or [table-with-dashes] in INSERT EXEC statements.
- Add transaction count save/restore in exec_stmt_execsql() to prevent 'Transaction count mismatch' error when procedures create temp tables, table variables, or use DML with OUTPUT clause during INSERT EXEC - Add CTE (WITH clause) support in is_simple_select_query() to allow INSERT EXEC to capture results from procedures using CTEs - Add INSERT EXEC context check in iterative_exec.c batch-level transaction count validation Test results: 19/20 tests pass (match SQL Server behavior) Known limitation: TRY-CATCH with error (test03) requires separate fix
Remove all debug elog(LOG,...) and elog(NOTICE,...) statements that were added during development of the INSERT EXEC feature. These statements were useful for debugging but should not be in production code. Removed debug statements from: - pl_exec.c: EXEC-STMT-MAIN-DEBUG, INSERT-EXEC-SKIP-TRIGGERS, INSERT-EXEC-TEMP-TABLE, INSERT-EXEC-REWRITE-DEBUG, INSERT-EXEC-REWRITE, INSERT-EXEC-DML-REWRITE, and INSERT-EXEC-DEBUG NOTICE statements - pl_exec-2.c: EXEC-STMT-DEBUG and all TRY-CATCH-DEBUG statements - hooks.c: INSERT-EXEC-SKIP-TRIGGERS statement Kept DEBUG1/DEBUG2 level statements as they are appropriate for production debugging when enabled.
Add INSERT EXEC context checks to iterative_exec.c to prevent transaction state corruption when errors occur during INSERT EXEC execution: 1. Skip internal savepoints during INSERT EXEC to avoid AfterTrigger query_depth mismatch issues. The INSERT EXEC temp table approach already provides proper transaction semantics. 2. Skip transaction rollback during INSERT EXEC error handling to prevent AfterTriggerEndXact from resetting query_depth while the outer INSERT still has an active query. 3. Use ReThrowError instead of FreeErrorData + PG_RE_THROW in handle_error to properly handle cases where FlushErrorState has already been called (e.g., during TRY-CATCH handling). 4. Add pltsql_insert_exec_skip_triggers() and pltsql_set_insert_exec_skip_triggers() function declarations to pltsql.h for trigger skipping during INSERT EXEC.
- Add BABEL-INSERT-EXEC.sql with comprehensive test coverage: - Basic INSERT EXEC scenarios (simple, multiple rows, parameters) - Column mapping and data types - Dynamic SQL (EXEC, sp_executesql) - Nested procedures (2-3 levels deep) - IDENTITY column handling (auto-generated and IDENTITY_INSERT) - Temp tables (target and source) - Advanced SQL constructs (CTE, UNION, conditional, loops) - Transaction behavior (COMMIT, ROLLBACK) - TRY/CATCH behavior - Error handling - Large result sets (100 rows) - OUTPUT clause in procedures - Add expected output file BABEL-INSERT-EXEC.out - Update jdbc_schedule to run new test - Remove old test files BABEL-INSERT-EXEC-ALL-ISSUES.sql and BABEL-INSERT-EXEC-DML.sql
The change from FreeErrorData+PG_RE_THROW to ReThrowError was incorrect. In handle_error(), we're inside a PG_CATCH block and the original error is still on the error stack. PG_RE_THROW() correctly re-throws that original error, while ReThrowError() would push a duplicate onto the stack causing issues during deep recursion (e.g., test 217_1). ReThrowError should only be used after FlushErrorState() has been called, which clears the error stack. The INSERT EXEC error handling in pl_exec.c correctly uses ReThrowError after FlushErrorState().
The 'errstart was not called' panic occurred when INSERT EXEC encountered errors (e.g., column mismatch). The issue was in the error handling flow: 1. When an error occurred, it was pushed onto PostgreSQL's error stack 2. PG_FINALLY called exec_drop_insert_exec_temp_table() to clean up 3. DROP TABLE failed with 'table is being used by active queries' 4. The PG_CATCH in exec_drop_insert_exec_temp_table called FlushErrorState() 5. FlushErrorState() cleared ALL errors, including the original error 6. When CopyErrorData() was called later, the stack was empty -> panic Fix: - Move exec_drop_insert_exec_temp_table() from PG_FINALLY to success path only - In PG_FINALLY, only clean up estate fields without attempting DROP TABLE - In exec_flush_insert_exec_temp_table, use PG_RE_THROW() instead of CopyErrorData/FlushErrorState/ReThrowError to preserve the error stack - Wrap DROP TABLE in a subtransaction to isolate any failures - Temp table is cleaned up when session ends on error paths Also added comments explaining ReThrowError usage in iterative_exec.c
When a procedure with OUTPUT parameters is called via INSERT EXEC: 1. The OUTPUT parameter values should be returned to the caller variable 2. Only the SELECT results should be inserted into the target table 3. The OUTPUT parameter values should NOT be inserted into the table The fix involves two key changes: 1. Copy OUTPUT parameter values to TopTransactionContext instead of the procedure's fn_cxt. The fn_cxt gets freed when the procedure returns, causing memory corruption (OID 0x7F7F7F7F error). 2. Set fcinfo->isnull = true to prevent the OUTPUT parameter values from being sent to the client/destination (which would insert them into the table). The new is_insert_exec_rewrite_active_hook in functioncmds.c allows this without causing 'procedure returned null record' error. 3. Register the hook in pl_handler.c to enable the INSERT EXEC context detection in PostgreSQL core. This matches SQL Server behavior where OUTPUT parameters are returned to caller variables but not inserted into the target table.
- Add transaction control handling for INSERT EXEC context - Block COMMIT/ROLLBACK that would terminate outer INSERT EXEC transaction - Handle BEGIN TRANSACTION/COMMIT inside INSERT EXEC procedures correctly - Fix row count reporting for INSERT EXEC with dynamic SQL - Register is_insert_exec_rewrite_active_hook for engine integration This change ensures SQL Server-compatible transaction behavior during INSERT EXEC operations, where ROLLBACK is blocked and BEGIN/COMMIT are allowed but don't affect the outer INSERT EXEC transaction.
Implement INSERT-EXEC support for procedures with OUTPUT parameters by rewriting SELECT statements inside procedures to INSERT into a temp buffer table, then flushing results to the target table after procedure completion. Key changes: - Add query rewriting infrastructure in pl_exec.c to transform SELECTs into CTE-based INSERTs during INSERT-EXEC execution - Add INSERT-EXEC context management functions in pltsql_utils.c - Add ExecutorStart hook to skip AFTER triggers during rewritten INSERTs to prevent crashes in TRY-CATCH error handling scenarios - Fix case-insensitive temp table name handling in buffer table creation (fixes BABEL-2999 and datareader_datawriter tests) Known limitation: - DROP TABLE inside a procedure fails with error 556 when called via INSERT-EXEC due to PostgreSQL holding relation references during CTE execution. This will be addressed in a follow-up fix.
…tack in plsql_TriggerRecursiveCheck In parallel workers, the exec_state_call_stack global variable is NULL because parallel workers don't have the PL/tsql execution context. When the plsql_TriggerRecursiveCheck hook is called from parallel workers during query execution, it would crash trying to dereference the NULL pointer. This fix adds a NULL check to plsql_TriggerRecursiveCheck() in hooks.c that returns false (not a recursive trigger) if exec_state_call_stack is NULL. This is the safe default behavior.
f2c3b7b to
1c80f61
Compare
When INSERT EXEC targets a view with an INSTEAD OF trigger, the trigger was not firing because the flush INSERT ran via SPI_execute without opening a composite trigger nesting level. Root cause: The composite trigger mechanism (BeginCompositeTriggers/ EndCompositeTriggers) is normally called from exec_stmt_execsql when enable_txn_in_triggers is true. But for INSERT EXEC, enable_txn_in_triggers is false, so INSTEAD OF triggers queued their events but the events were never fired (IsCompositeTriggerActive() returned false). The fix: 1. Clear INSERT EXEC rewrite context before the flush INSERT so it behaves like a normal INSERT 2. Call BeginCompositeTriggers() before SPI_execute to open a trigger nesting level 3. Call EndCompositeTriggers(false) after successful execution to fire queued INSTEAD OF triggers 4. Set pltsql_disable_txn_in_triggers=true during flush to force atomic SPI, preventing premature commits that would destroy the trigger's transition tables 5. Add commit_stmt guard to skip commits while INSERT EXEC rewrite is active, protecting the 'inserted' pseudo-table from being destroyed This matches SQL Server behavior where INSERT EXEC into a view with an INSTEAD OF trigger fires the trigger once with all rows from the procedure's result set
The NULL checks for exec_state_call_stack added in commits 99d9d2f and 31c1136 were causing general JDBC tests to timeout. These checks were intended to fix parallel query crashes but inadvertently broke the normal execution path. This reverts those specific changes while keeping the important fixes: - 1c80f61: Fix INSERT EXEC snapshot error with AFTER triggers - df12e05: Fix INSERT EXEC INSTEAD OF triggers on view targets - 7d020a7: Fix C90 failure Parallel query tests may still fail (separate issue), but general JDBC tests should now pass without timeout.
e7b6a8e to
968709c
Compare
The previous commits introduced a two-bug loop: 1. With NULL checks in is_part_of_pltsql_trycatch_block: in_trycatch=false -> savepoints skipped -> DROP TABLE hangs -> timeout 2. Without NULL checks: Assert fires on NULL dereference -> SIGSEGV -> crash Root cause: exec_state_call_stack can be NULL during INSERT EXEC SPI execution paths, not just in parallel workers. Fix: 1. Restore NULL guards in is_part_of_pltsql_trycatch_block and is_part_of_pltsql_trigger, but replace hard Assert with soft return-false to prevent crashes 2. Add trycatch_depth field to InsertExecRewriteContext to explicitly track TRY-CATCH nesting during INSERT EXEC 3. Add fallback in dispatch_stmt_handle_error: when exec_state_call_stack is NULL but INSERT EXEC is active, use the explicit trycatch_depth to determine if savepoints are needed 4. Track TRY-CATCH entry/exit in exec_stmt_try_catch via new helper functions pltsql_insert_exec_enter_trycatch/exit_trycatch This breaks the timeout<->crash cycle by: - Preventing crashes when exec_state_call_stack is NULL (soft check) - Still enabling savepoints during INSERT EXEC + TRY-CATCH (fallback) - Properly releasing index references on caught errors (no DROP hang)
The previous fix (68f12ee) only protected 2 of 9 unsafe exec_state_call_stack dereferences. This commit adds NULL guards to the remaining 7 sites that can crash when exec_state_call_stack is NULL during INSERT EXEC SPI execution. Protected functions/locations: 1. record_error_state() - guard at top 2. is_error_raising_batch() - guard at top 3. is_xact_abort_on_error() - guard at top 4. abort_transaction() line ~1159 - guard || right-hand side 5. abort_execution() lines ~1207, ~1210 - guard || and direct access 6. exec_stmt_iterative() lines ~1714-1718 - guard error_data block 7. exec_stmt_iterative() line ~1783 - guard condition with || NULL check 8. set_exec_error_data() - guard at top 9. reset_exec_error_data() - guard at top The exec_state_call_stack can be NULL during INSERT EXEC because the SPI execution path doesn't always have a full PL/tsql call stack established. Without these guards, any error during INSERT EXEC would cause a SIGSEGV.
Three remaining crash sites after commit 3b06bdd (which covered iterative_exec.c) still dereference exec_state_call_stack without a NULL check: 1. pl_exec.c:1283 (pltsql_exec_trigger): pltsql_exec_trigger never pushes exec_state_call_stack -- only pltsql_exec_function does at line 4475. When a trigger fires from a pure SQL DML context with no outer PL/tsql function, exec_state_call_stack is NULL. Add NULL check before the trigger_error field dereference. 2. pl_exec.c:10301 (pltsql_estate_cleanup): INSERT EXEC cleanup paths can call this function with an empty stack. Add early return when exec_state_call_stack is NULL to prevent crash on ->next dereference. 3. pl_exec-2.c:3365 (exec_stmt_usedb): The loop traverses the call stack looking for EXEC_BATCH to suppress the database context change notification. Guard the initial ->next access with a ternary; a NULL stack means no EXEC_BATCH above us, so fall through normally to send the notification (do NOT return early here). Also revert proc_ownership_chaining-vu-verify.out: accepting a permission denied error for EXEC p6030_8_3c is a regression. The CTE rewrite SPI call runs under the caller's security context rather than the procedure owner's, breaking ownership chaining for INSERT EXEC. Root-cause fix (security context switching in exec_stmt_execsql) will follow as a separate commit.
6a18d52 to
551ac36
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
What: Currently, Babelfish's INSERT EXEC query rewriting approach lacks transaction atomicity and fails to handle several INSERT EXEC syntax variants. With this change, INSERT EXEC now:
Provides atomic transaction behavior using savepoints (all rows inserted or none on error)
Supports INSERT INTO t EXEC (@variable) with dynamic SQL in variables
Supports INSERT INTO t EXEC sp_executesql N'SELECT ...'
Supports INSERT INTO t EXEC @rc = proc with return value capture
Why: The query rewriting approach transforms SELECTs inside procedures into INSERT statements. Without savepoints, if a rewritten INSERT fails mid-procedure, previously inserted rows remain (partial state). Additionally, several INSERT EXEC syntax variants were failing because the parser couldn't distinguish them from regular EXEC statements, causing syntax errors or incorrect execution paths.
How:
Savepoint integration (pl_exec.c): Wraps procedure execution in BeginInternalSubTransaction("insert_exec") with proper rollback on error
Parser enhancements (tsqlIface.cpp): Detects special INSERT EXEC patterns in enterDml_statement and creates appropriate statement types (PLtsql_stmt_exec_batch or PLtsql_stmt_exec with insert_exec=true)
Struct extensions (pltsql-2.h): Added insert_exec, insert_exec_target, and insert_exec_columns fields to PLtsql_stmt_exec
Execution handling (pl_exec-2.c): Sets up INSERT EXEC rewrite context before procedure execution
Issues Resolved
BABEL-5522: INSERT EXEC Query Rewriting POC enhancements
Partial fix for BABEL-2343: EXEC (@variable) support
Partial fix for BABEL-2385: INSERT INTO t2 EXEC sp_executesql support
Partial fix for BABEL-6145: INSERT INTO .. EXEC @v support
Issues Resolved
[List any issues this PR will resolve]
Test Scenarios Covered
Use case based -
Boundary conditions -
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.