Skip to content

Insert exec query tree modification (poc pr)#4581

Open
monk0707 wants to merge 38 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
monk0707:insert-exec-savepoint-transaction
Open

Insert exec query tree modification (poc pr)#4581
monk0707 wants to merge 38 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
monk0707:insert-exec-savepoint-transaction

Conversation

@monk0707
Copy link
Copy Markdown

Description

What: Currently, Babelfish's INSERT EXEC query rewriting approach lacks transaction atomicity and fails to handle several INSERT EXEC syntax variants. With this change, INSERT EXEC now:

Provides atomic transaction behavior using savepoints (all rows inserted or none on error)
Supports INSERT INTO t EXEC (@variable) with dynamic SQL in variables
Supports INSERT INTO t EXEC sp_executesql N'SELECT ...'
Supports INSERT INTO t EXEC @rc = proc with return value capture
Why: The query rewriting approach transforms SELECTs inside procedures into INSERT statements. Without savepoints, if a rewritten INSERT fails mid-procedure, previously inserted rows remain (partial state). Additionally, several INSERT EXEC syntax variants were failing because the parser couldn't distinguish them from regular EXEC statements, causing syntax errors or incorrect execution paths.

How:

Savepoint integration (pl_exec.c): Wraps procedure execution in BeginInternalSubTransaction("insert_exec") with proper rollback on error
Parser enhancements (tsqlIface.cpp): Detects special INSERT EXEC patterns in enterDml_statement and creates appropriate statement types (PLtsql_stmt_exec_batch or PLtsql_stmt_exec with insert_exec=true)
Struct extensions (pltsql-2.h): Added insert_exec, insert_exec_target, and insert_exec_columns fields to PLtsql_stmt_exec
Execution handling (pl_exec-2.c): Sets up INSERT EXEC rewrite context before procedure execution
Issues Resolved
BABEL-5522: INSERT EXEC Query Rewriting POC enhancements
Partial fix for BABEL-2343: EXEC (@variable) support
Partial fix for BABEL-2385: INSERT INTO t2 EXEC sp_executesql support
Partial fix for BABEL-6145: INSERT INTO .. EXEC @v support

Issues Resolved

[List any issues this PR will resolve]

Test Scenarios Covered

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@monk0707 monk0707 force-pushed the insert-exec-savepoint-transaction branch 2 times, most recently from 4cb263f to 4625bd2 Compare February 27, 2026 09:38
monk0707 added 28 commits March 6, 2026 06:39
This commit implements INSERT EXEC functionality using query rewriting:
- Extract target table and column list from INSERT EXEC statement in tsqlIface.cpp
- Add InsertExecRewriteContext structure to track rewrite state in pltsql.h
- Implement query rewriting in pl_exec.c:
  - is_simple_select_query(): Detect SELECT statements eligible for rewriting
  - should_rewrite_for_insert_exec(): Check if rewrite context is active
  - exec_rewritten_insert_exec(): Transform SELECT to INSERT using CTE pattern

The rewriting transforms:
  SELECT col1, col2 FROM source
Into:
  WITH __insert_exec_cte AS (SELECT col1, col2 FROM source)
  INSERT INTO target_table (columns) SELECT * FROM __insert_exec_cte

Tested with:
- Nested procedures (3 levels deep)
- JOINs, dynamic SQL, WHILE loops
- TRY-CATCH blocks, aggregations, UNION
- CASE expressions, TOP/ORDER BY
- Cross-database INSERT EXEC
Wrap INSERT EXEC procedure execution in a savepoint (BeginInternalSubTransaction)
to ensure atomic behavior. If the procedure fails at any point (constraint
violation, error in procedure, etc.), all inserted rows are rolled back.

This matches SQL Server's behavior where INSERT EXEC is atomic - either all
rows are inserted or none are.

Key implementation details:
- Use BeginInternalSubTransaction/ReleaseCurrentSubTransaction for success path
- Use RollbackAndReleaseCurrentSubTransaction for error path
- Save error data BEFORE rollback to avoid 'Invalid cursor state' errors
- Re-raise error with ereport(ERROR, ...) instead of PG_RE_THROW()
- Properly restore memory context, resource owner, and eval_econtext
Add support for INSERT INTO t EXEC (@variable) syntax where the SQL
string is stored in a variable and evaluated at runtime.

Implementation approach:
- Parser detects INSERT INTO ... EXEC (@var) pattern and creates
  PLtsql_stmt_exec_batch instead of PLtsql_stmt_execsql
- The INSERT context (target table, column list) is stored in the
  PLtsql_stmt_exec_batch structure
- At execution time, the INSERT EXEC rewrite context is set before
  executing the dynamic SQL, allowing SELECT statements to be
  rewritten as INSERT statements

Changes:
- pltsql-2.h: Add insert_exec, insert_exec_target, insert_exec_columns
  fields to PLtsql_stmt_exec_batch structure
- pl_exec-2.c: Set/clear INSERT EXEC rewrite context in exec_stmt_exec_batch
- tsqlIface.cpp: Detect INSERT EXEC with dynamic SQL in ANTLR parser
  and create appropriate statement type with INSERT context
Add support for INSERT INTO table EXEC sp_executesql syntax.
Previously, sp_executesql was only handled as a standalone EXEC statement
but not when used within INSERT EXEC context.

Changes:
- Detect sp_executesql in INSERT EXEC context in enterDml_statement
- Create PLtsql_stmt_exec_batch with the SQL string argument as expression
- Skip normal PLtsql_stmt_execsql processing in exitDml_statement for this case

This allows queries like:
  INSERT INTO t2 EXEC sp_executesql N'SELECT * FROM t1'
  INSERT INTO t2 EXEC sp_executesql @sqlVariable
When executing INSERT INTO t EXEC @rc = P, the return_status (@rc =)
caused the normal makeSQL() path to fail because PostgreSQL parser
doesn't understand the @rc = syntax.

This commit handles INSERT EXEC with return status the same way as
EXEC (@variable) and sp_executesql - by detecting the pattern in
enterDml_statement and creating a PLtsql_stmt_exec with insert_exec=true.

Changes:
- pltsql-2.h: Added insert_exec, insert_exec_target, and insert_exec_columns
  fields to PLtsql_stmt_exec struct
- tsqlIface.cpp: Added detection for body->return_status in enterDml_statement
  to create PLtsql_stmt_exec with INSERT EXEC context; added skip condition
  in exitDml_statement
- pl_exec-2.c: Added INSERT EXEC rewrite context setup in exec_stmt_exec
  before procedure execution, with cleanup in PG_FINALLY block
Changes:
1. Fixed SPI return code check for temp table creation - accept both
   SPI_OK_SELINTO and SPI_OK_UTILITY since CREATE TABLE AS SELECT
   returns SPI_OK_SELINTO

2. Fixed flush query for IDENTITY_INSERT - removed explicit OVERRIDING
   SYSTEM VALUE since T-SQL parser automatically adds it when
   IDENTITY_INSERT is ON for the target table

3. Fixed SPI return code check for flush INSERT - accept both
   SPI_OK_INSERT and SPI_OK_INSERT_RETURNING since T-SQL parser adds
   RETURNING clause when IDENTITY_INSERT is ON

4. Added handling for column_list with IDENTITY_INSERT ON - use type
   casts in subquery to strip IDENTITY property from temp table columns

5. Added insert_exec_identity_insert field to PLtsql_execstate to track
   IDENTITY_INSERT state during INSERT EXEC execution

6. Skip has_ident check in pl_handler.c for INSERT EXEC statements
   since values come from procedure, not explicit VALUES
- Add quoteIdentifierIfNeeded() helper in tsqlIface.cpp to quote identifiers
  containing special characters (spaces, dashes, etc.) with square brackets
- Update target_table and column_list building to use quoteIdentifierIfNeeded()
- Add strip_tsql_brackets() helper in pl_exec.c to strip T-SQL brackets
- Add strip_pg_quotes() helper in pl_exec.c to strip PostgreSQL double quotes
- Update exec_create_insert_exec_temp_table() to properly handle table names
  with spaces by stripping brackets and using quote_identifier() for PostgreSQL
- Fix IDENTITY_INSERT OID lookup to strip double quotes from table name before
  calling get_relname_relid()

This fixes the 'invalid name syntax' error when using table names like
[t bracket test] or [table-with-dashes] in INSERT EXEC statements.
- Add transaction count save/restore in exec_stmt_execsql() to prevent
  'Transaction count mismatch' error when procedures create temp tables,
  table variables, or use DML with OUTPUT clause during INSERT EXEC

- Add CTE (WITH clause) support in is_simple_select_query() to allow
  INSERT EXEC to capture results from procedures using CTEs

- Add INSERT EXEC context check in iterative_exec.c batch-level
  transaction count validation

Test results: 19/20 tests pass (match SQL Server behavior)
Known limitation: TRY-CATCH with error (test03) requires separate fix
Remove all debug elog(LOG,...) and elog(NOTICE,...) statements that were
added during development of the INSERT EXEC feature. These statements
were useful for debugging but should not be in production code.

Removed debug statements from:
- pl_exec.c: EXEC-STMT-MAIN-DEBUG, INSERT-EXEC-SKIP-TRIGGERS,
  INSERT-EXEC-TEMP-TABLE, INSERT-EXEC-REWRITE-DEBUG, INSERT-EXEC-REWRITE,
  INSERT-EXEC-DML-REWRITE, and INSERT-EXEC-DEBUG NOTICE statements
- pl_exec-2.c: EXEC-STMT-DEBUG and all TRY-CATCH-DEBUG statements
- hooks.c: INSERT-EXEC-SKIP-TRIGGERS statement

Kept DEBUG1/DEBUG2 level statements as they are appropriate for
production debugging when enabled.
Add INSERT EXEC context checks to iterative_exec.c to prevent transaction
state corruption when errors occur during INSERT EXEC execution:

1. Skip internal savepoints during INSERT EXEC to avoid AfterTrigger
   query_depth mismatch issues. The INSERT EXEC temp table approach
   already provides proper transaction semantics.

2. Skip transaction rollback during INSERT EXEC error handling to prevent
   AfterTriggerEndXact from resetting query_depth while the outer INSERT
   still has an active query.

3. Use ReThrowError instead of FreeErrorData + PG_RE_THROW in handle_error
   to properly handle cases where FlushErrorState has already been called
   (e.g., during TRY-CATCH handling).

4. Add pltsql_insert_exec_skip_triggers() and pltsql_set_insert_exec_skip_triggers()
   function declarations to pltsql.h for trigger skipping during INSERT EXEC.
- Add BABEL-INSERT-EXEC.sql with comprehensive test coverage:
  - Basic INSERT EXEC scenarios (simple, multiple rows, parameters)
  - Column mapping and data types
  - Dynamic SQL (EXEC, sp_executesql)
  - Nested procedures (2-3 levels deep)
  - IDENTITY column handling (auto-generated and IDENTITY_INSERT)
  - Temp tables (target and source)
  - Advanced SQL constructs (CTE, UNION, conditional, loops)
  - Transaction behavior (COMMIT, ROLLBACK)
  - TRY/CATCH behavior
  - Error handling
  - Large result sets (100 rows)
  - OUTPUT clause in procedures

- Add expected output file BABEL-INSERT-EXEC.out
- Update jdbc_schedule to run new test
- Remove old test files BABEL-INSERT-EXEC-ALL-ISSUES.sql and BABEL-INSERT-EXEC-DML.sql
The change from FreeErrorData+PG_RE_THROW to ReThrowError was incorrect.
In handle_error(), we're inside a PG_CATCH block and the original error
is still on the error stack. PG_RE_THROW() correctly re-throws that
original error, while ReThrowError() would push a duplicate onto the
stack causing issues during deep recursion (e.g., test 217_1).

ReThrowError should only be used after FlushErrorState() has been called,
which clears the error stack. The INSERT EXEC error handling in pl_exec.c
correctly uses ReThrowError after FlushErrorState().
The 'errstart was not called' panic occurred when INSERT EXEC encountered
errors (e.g., column mismatch). The issue was in the error handling flow:

1. When an error occurred, it was pushed onto PostgreSQL's error stack
2. PG_FINALLY called exec_drop_insert_exec_temp_table() to clean up
3. DROP TABLE failed with 'table is being used by active queries'
4. The PG_CATCH in exec_drop_insert_exec_temp_table called FlushErrorState()
5. FlushErrorState() cleared ALL errors, including the original error
6. When CopyErrorData() was called later, the stack was empty -> panic

Fix:
- Move exec_drop_insert_exec_temp_table() from PG_FINALLY to success path only
- In PG_FINALLY, only clean up estate fields without attempting DROP TABLE
- In exec_flush_insert_exec_temp_table, use PG_RE_THROW() instead of
  CopyErrorData/FlushErrorState/ReThrowError to preserve the error stack
- Wrap DROP TABLE in a subtransaction to isolate any failures
- Temp table is cleaned up when session ends on error paths

Also added comments explaining ReThrowError usage in iterative_exec.c
When a procedure with OUTPUT parameters is called via INSERT EXEC:
1. The OUTPUT parameter values should be returned to the caller variable
2. Only the SELECT results should be inserted into the target table
3. The OUTPUT parameter values should NOT be inserted into the table

The fix involves two key changes:

1. Copy OUTPUT parameter values to TopTransactionContext instead of
   the procedure's fn_cxt. The fn_cxt gets freed when the procedure
   returns, causing memory corruption (OID 0x7F7F7F7F error).

2. Set fcinfo->isnull = true to prevent the OUTPUT parameter values
   from being sent to the client/destination (which would insert them
   into the table). The new is_insert_exec_rewrite_active_hook in
   functioncmds.c allows this without causing 'procedure returned
   null record' error.

3. Register the hook in pl_handler.c to enable the INSERT EXEC
   context detection in PostgreSQL core.

This matches SQL Server behavior where OUTPUT parameters are returned
to caller variables but not inserted into the target table.
- Add transaction control handling for INSERT EXEC context
- Block COMMIT/ROLLBACK that would terminate outer INSERT EXEC transaction
- Handle BEGIN TRANSACTION/COMMIT inside INSERT EXEC procedures correctly
- Fix row count reporting for INSERT EXEC with dynamic SQL
- Register is_insert_exec_rewrite_active_hook for engine integration

This change ensures SQL Server-compatible transaction behavior during
INSERT EXEC operations, where ROLLBACK is blocked and BEGIN/COMMIT
are allowed but don't affect the outer INSERT EXEC transaction.
Implement INSERT-EXEC support for procedures with OUTPUT parameters by
rewriting SELECT statements inside procedures to INSERT into a temp
buffer table, then flushing results to the target table after procedure
completion.

Key changes:
- Add query rewriting infrastructure in pl_exec.c to transform SELECTs
  into CTE-based INSERTs during INSERT-EXEC execution
- Add INSERT-EXEC context management functions in pltsql_utils.c
- Add ExecutorStart hook to skip AFTER triggers during rewritten INSERTs
  to prevent crashes in TRY-CATCH error handling scenarios
- Fix case-insensitive temp table name handling in buffer table creation
  (fixes BABEL-2999 and datareader_datawriter tests)

Known limitation:
- DROP TABLE inside a procedure fails with error 556 when called via
  INSERT-EXEC due to PostgreSQL holding relation references during
  CTE execution. This will be addressed in a follow-up fix.
monk0707 added 2 commits March 6, 2026 06:39
…tack in plsql_TriggerRecursiveCheck

In parallel workers, the exec_state_call_stack global variable is NULL
because parallel workers don't have the PL/tsql execution context.
When the plsql_TriggerRecursiveCheck hook is called from parallel
workers during query execution, it would crash trying to dereference
the NULL pointer.

This fix adds a NULL check to plsql_TriggerRecursiveCheck() in hooks.c
that returns false (not a recursive trigger) if exec_state_call_stack
is NULL. This is the safe default behavior.
@monk0707 monk0707 force-pushed the insert-exec-savepoint-transaction branch from f2c3b7b to 1c80f61 Compare March 6, 2026 06:39
monk0707 added 4 commits March 6, 2026 08:28
When INSERT EXEC targets a view with an INSTEAD OF trigger, the trigger
was not firing because the flush INSERT ran via SPI_execute without
opening a composite trigger nesting level.

Root cause: The composite trigger mechanism (BeginCompositeTriggers/
EndCompositeTriggers) is normally called from exec_stmt_execsql when
enable_txn_in_triggers is true. But for INSERT EXEC, enable_txn_in_triggers
is false, so INSTEAD OF triggers queued their events but the events were
never fired (IsCompositeTriggerActive() returned false).

The fix:
1. Clear INSERT EXEC rewrite context before the flush INSERT so it
   behaves like a normal INSERT
2. Call BeginCompositeTriggers() before SPI_execute to open a trigger
   nesting level
3. Call EndCompositeTriggers(false) after successful execution to fire
   queued INSTEAD OF triggers
4. Set pltsql_disable_txn_in_triggers=true during flush to force atomic
   SPI, preventing premature commits that would destroy the trigger's
   transition tables
5. Add commit_stmt guard to skip commits while INSERT EXEC rewrite is
   active, protecting the 'inserted' pseudo-table from being destroyed

This matches SQL Server behavior where INSERT EXEC into a view with an
INSTEAD OF trigger fires the trigger once with all rows from the
procedure's result set
The NULL checks for exec_state_call_stack added in commits 99d9d2f
and 31c1136 were causing general JDBC tests to timeout. These checks
were intended to fix parallel query crashes but inadvertently broke
the normal execution path.

This reverts those specific changes while keeping the important fixes:
- 1c80f61: Fix INSERT EXEC snapshot error with AFTER triggers
- df12e05: Fix INSERT EXEC INSTEAD OF triggers on view targets
- 7d020a7: Fix C90 failure

Parallel query tests may still fail (separate issue), but general
JDBC tests should now pass without timeout.
@monk0707 monk0707 force-pushed the insert-exec-savepoint-transaction branch from e7b6a8e to 968709c Compare March 7, 2026 13:04
monk0707 added 3 commits March 7, 2026 14:54
The previous commits introduced a two-bug loop:
1. With NULL checks in is_part_of_pltsql_trycatch_block: in_trycatch=false
   -> savepoints skipped -> DROP TABLE hangs -> timeout
2. Without NULL checks: Assert fires on NULL dereference -> SIGSEGV -> crash

Root cause: exec_state_call_stack can be NULL during INSERT EXEC SPI
execution paths, not just in parallel workers.

Fix:
1. Restore NULL guards in is_part_of_pltsql_trycatch_block and
   is_part_of_pltsql_trigger, but replace hard Assert with soft
   return-false to prevent crashes

2. Add trycatch_depth field to InsertExecRewriteContext to explicitly
   track TRY-CATCH nesting during INSERT EXEC

3. Add fallback in dispatch_stmt_handle_error: when exec_state_call_stack
   is NULL but INSERT EXEC is active, use the explicit trycatch_depth
   to determine if savepoints are needed

4. Track TRY-CATCH entry/exit in exec_stmt_try_catch via new helper
   functions pltsql_insert_exec_enter_trycatch/exit_trycatch

This breaks the timeout<->crash cycle by:
- Preventing crashes when exec_state_call_stack is NULL (soft check)
- Still enabling savepoints during INSERT EXEC + TRY-CATCH (fallback)
- Properly releasing index references on caught errors (no DROP hang)
The previous fix (68f12ee) only protected 2 of 9 unsafe exec_state_call_stack
dereferences. This commit adds NULL guards to the remaining 7 sites that can
crash when exec_state_call_stack is NULL during INSERT EXEC SPI execution.

Protected functions/locations:
1. record_error_state() - guard at top
2. is_error_raising_batch() - guard at top
3. is_xact_abort_on_error() - guard at top
4. abort_transaction() line ~1159 - guard || right-hand side
5. abort_execution() lines ~1207, ~1210 - guard || and direct access
6. exec_stmt_iterative() lines ~1714-1718 - guard error_data block
7. exec_stmt_iterative() line ~1783 - guard condition with || NULL check
8. set_exec_error_data() - guard at top
9. reset_exec_error_data() - guard at top

The exec_state_call_stack can be NULL during INSERT EXEC because the SPI
execution path doesn't always have a full PL/tsql call stack established.
Without these guards, any error during INSERT EXEC would cause a SIGSEGV.
Three remaining crash sites after commit 3b06bdd (which covered
iterative_exec.c) still dereference exec_state_call_stack without
a NULL check:

1. pl_exec.c:1283 (pltsql_exec_trigger): pltsql_exec_trigger never
   pushes exec_state_call_stack -- only pltsql_exec_function does at
   line 4475. When a trigger fires from a pure SQL DML context with
   no outer PL/tsql function, exec_state_call_stack is NULL. Add NULL
   check before the trigger_error field dereference.

2. pl_exec.c:10301 (pltsql_estate_cleanup): INSERT EXEC cleanup paths
   can call this function with an empty stack. Add early return when
   exec_state_call_stack is NULL to prevent crash on ->next dereference.

3. pl_exec-2.c:3365 (exec_stmt_usedb): The loop traverses the call
   stack looking for EXEC_BATCH to suppress the database context
   change notification. Guard the initial ->next access with a ternary;
   a NULL stack means no EXEC_BATCH above us, so fall through normally
   to send the notification (do NOT return early here).

Also revert proc_ownership_chaining-vu-verify.out: accepting a
permission denied error for EXEC p6030_8_3c is a regression. The
CTE rewrite SPI call runs under the caller's security context rather
than the procedure owner's, breaking ownership chaining for INSERT
EXEC. Root-cause fix (security context switching in exec_stmt_execsql)
will follow as a separate commit.
@monk0707 monk0707 force-pushed the insert-exec-savepoint-transaction branch from 6a18d52 to 551ac36 Compare March 8, 2026 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant