Skip to content

Add remote procedure execution via TDS protocol with RPC out validation for linked servers#4292

Open
fizzlr wants to merge 23 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:remote-proc-exec-wip
Open

Add remote procedure execution via TDS protocol with RPC out validation for linked servers#4292
fizzlr wants to merge 23 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:remote-proc-exec-wip

Conversation

@fizzlr
Copy link
Copy Markdown
Contributor

@fizzlr fizzlr commented Nov 25, 2025

Implement remote stored procedure execution for Babelfish linked servers through TDS RPC calls with security validations. Key changes include:

  • Add RPC call handling and parameter marshaling in linked servers related files
  • Extend catalog functions for remote procedure metadata lookup
  • Implement result processing and error handling for remote calls
  • Add RPC out server option validation for security
  • Add SELECT-only statement validation for remote procedures
  • Update parser to detect and route remote procedure calls

Description

What is the change?
Currently, Babelfish does not support executing stored procedures on remote linked servers. With this change, users can now execute stored procedures on Babelfish linked servers using four-part naming syntax (e.g., EXEC [linked_server].[database].[schema].[procedure_name]). The execution is performed via TDS RPC calls to the remote server.

Why was the change made?
Linked server support is a critical feature for SQL Server compatibility, allowing users to execute queries and procedures across multiple database instances. Remote procedure execution enables distributed database applications and cross-server workflows. Without this capability, users migrating to Babelfish would need to refactor applications that rely on remote procedure calls.

How was the code changed?

  • Parser and executor enhancements: Added detection logic to identify remote procedure calls from four-part names and route them to the appropriate handler

  • Linked server infrastructure: Extended linked servers related files with RPC call handling, parameter marshaling, and TDS protocol communication

  • Catalog integration: Enhanced catalog functions to look up remote procedure metadata

  • Security validations: Implemented RPC out server option checking and SELECT-only statement validation to ensure secure remote execution

  • Result processing: Added handling for remote procedure results, error propagation, and transaction coordination

Issues Resolved

Jira: 4178

Tests

RUNNING remote-proc-exec-vu-prepare
RUNNING remote-proc-exec-vu-verify
RUNNING remote-proc-exec-vu-cleanup
All tests executed successfully!
###########################################################################
################################ SUMMARY ################################
###########################################################################
remote-proc-exec-vu-prepare: Passed! (869/120000ms OK)
remote-proc-exec-vu-verify: Passed! (71685/120000ms OK)
remote-proc-exec-vu-cleanup: Passed! (134/120000ms OK)
###########################################################################
TOTAL TESTS: 3
TESTS PASSED: 3
TESTS FAILED: 0
###########################################################################
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 73.386 s - in com.sqlsamples.TestQueryFile
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:14 min
[INFO] Finished at: 2025-12-15T19:14:55Z
[INFO] ------------------------------------------------------------------------

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

int connect_timeout = 0;

/* LOG: Function entry with context - proves this function was called */
elog(LOG, "DEBUG_LINKED_SERVER: linked_server_establish_connection() called for server: %s (context: %s)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this log for ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exposing the server url in LOG info ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific one logs server name, will check if other logs log server url

LINKED_SERVER_DEBUG("LINKED SERVER: (Metadata) - Closing connections to remote server");
LINKED_SERVER_EXIT();
LINKED_SERVER_DEBUG("LINKED SERVER: (Metadata) - Closing connection to remote server");
LINKED_SERVER_CANCEL(lsproc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a LINKED_SERVER_CANCEL in here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added LINKED_SERVER_CANCEL before LINKED_SERVER_CLOSE to ensure proper cleanup of pending TDS operations and prevent connection leaks in error paths when metadata queries fail. This also prevent race condition of validation query and actual execution query for their tds connections, found out in local testing.

* Map PostgreSQL/Babelfish type OID to TDS type for RPC parameter binding
*/
int
get_tds_type_from_pg_oid(Oid pgtype)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have similar function already in the code ?

Copy link
Copy Markdown
Contributor Author

@fizzlr fizzlr Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementations are specific to TDS RPC parameter binding (converting PG types to TDS wire format for dbrpcparam). Tried existing type functions, but it throws errors and inconsistent to types required for freetds library

* Returns palloc'd buffer containing the data
*/
void
convert_datum_to_tds_bytes(Datum value, Oid valtype, int32 valtypmod, bool isnull,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have existed function in the code ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementations are specific to TDS RPC parameter binding (converting PG types to TDS wire format for dbrpcparam), which has different requirements than general type conversion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be multiple instances of duplicated code. I think we can combine multiple conditionals in this function

Copy link
Copy Markdown
Contributor Author

@fizzlr fizzlr Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted datum_text_to_tds_bytes() and datum_output_to_tds_bytes() helpers, replacing duplicate code blocks.

ListCell *lc;

/* Build the full procedure name: database.schema.procedure */
full_proc_name = psprintf("%s.%s.%s",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a proper way to do a format like this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psprintf is PG standard memory-safe string formatting function and it's widely used in the codebase for dynamic string construction.


/* Build query to fetch procedure definition */
initStringInfo(&validation_query);
appendStringInfo(&validation_query,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL concation in here will leave a SQL inject leaking, pls try to use prepared stmt to do so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will update with prepared statements to prevent sql injection

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced string concatenation with sp_executesql and proper parameter binding (@proc_name, @schema_name parameters) to prevent SQL injection. The query template is now passed to sp_executesql which handles parameter substitution securely


/* Convert Datum to raw bytes */
convert_datum_to_tds_bytes(val, valtype, valtypmod, isnull,
&param_data, &param_len);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not consider pltsql_exec_tsql_cast_value ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems doing the same thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pltsql_exec_tsql_cast_value and convert_datum_to_tds_bytes serve different purposes. The former performs T-SQL type casting during expression evaluation (Datum → Datum), while the latter serializes already-typed Datums to TDS wire format (Datum → raw bytes) for RPC parameter binding.

PushActiveSnapshot(GetTransactionSnapshot());

/* Create a Portal to wrap the tuplestore for sending to client */
portal = SPI_cursor_open_with_args(NULL, "SELECT 1", 0, NULL, NULL, NULL, true, 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "select 1" , where's this from ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SELECT 1" is a dummy query required by the SPI_cursor_open_with_args() API to create a Portal. The actual result data comes from the remote procedure's tuplestore which is assigned to portal->holdStore. This is a workaround to use PG Portal mechanism for transmitting result sets to the client.

LINKED_SERVER_EXIT();
LINKED_SERVER_DEBUG("LINKED SERVER: (OPENQUERY) - Closing connection to remote server");
LINKED_SERVER_CANCEL(lsproc);
LINKED_SERVER_CLOSE(lsproc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added LINKED_SERVER_CANCEL before LINKED_SERVER_CLOSE to ensure proper cleanup of pending TDS operations and prevent connection leaks in error paths when metadata queries fail. This also prevent race condition of validation query and actual execution query for their tds connections, found out in local testing.


if (colcount > 0)
{
char bind_definition[65536] = {0x00}; /* Large buffer for procedure body */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change this. definition column in sys.sql_modules is nvarchar(max). There is no guarantee that it will fit within 65536 characters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed buffer binding entirely and now fetch data directly using LINKED_SERVER_DATA()/LINKED_SERVER_DATA_LEN(). This matches the pattern used in openquery_imp() and handles nvarchar(max) columns.

/* Perform static analysis - case-insensitive search for forbidden keywords */
def_lower = lowerstr(definition);

/* Check for DML operations */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we are performing static analysis here doesn't seem conclusive. Can't we use our own ANTLR parser to check for DML/DDL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on this. Will look into the new comments on rev2

return rc;
}

#ifdef ENABLE_TDS_LIB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason we want to keep this code outside of linked_server.c?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's put here since it's tightly integrated with pltsql estate, var and etc. Moving it to linked_server.c would require exposing many internal PLtsql structures. However, if we do want to separate out linked servers from executor code, I can try to refactor and adjust.


/* Build the full procedure name: database.schema.procedure */
full_proc_name = psprintf("%s.%s.%s",
stmt->db_name ? stmt->db_name : "master",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumption to default to master database is not correct. The default database is configurable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced hard-coded "master" with get_cur_db_name() which respects the user's current database context

new_record[Anum_bbf_servers_def_connect_timeout - 1] = Int32GetDatum(timeout);
}
}
else if (optname && strlen(optname) == 7 && strncmp(optname, "rpc out", 7) == 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets start adding automated test cases for all new changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comprehensive automated test suite (100 test cases) into JDBC tests which is linked servers tests reside.


if (data && data_len > 0)
{
/* Allocate exactly the size needed based on actual data length */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use StringInfoBuf here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other places are using StringInfoBuf to build strings incrementally while here it's one time use for known size data. It also avoids the StringInfo's 1024-byte initial overhead. When the data is large, it also avoids the enlarging buffer multiple times.

Comment on lines +311 to +313
~~ERROR (Code: 33557097)~~

~~ERROR (Message: TDS client library error: DB #: 20114, DB Msg: When passing a SYBINTN, SYBDATETIMN, SYBMONEYN, or SYBFLTN parameter via dbrpcparam, it is necessary to specify the parameter's maximum or actual length so that DB-Library can recognize it as a SYINT1, SYBINT2, SYBINT4, SYBMONEY, SYBMONEY4, and so on, OS #: 0, OS Msg: Success, Level: 7)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is error expected here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, fixed it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look fixed

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: TDS client library error: Msg #: 33557097, Msg state: 1, Msg: timestamp out of range, Server: BABELFISH, Process: , Line: 1, Level: 16)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is error expected here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not look fixed


#ifdef ENABLE_TDS_LIB
/*
* Validate remote procedure definition using PostgreSQL parser.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using PG parser to validate definition of remote procedure?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Changed from static analysis to antlr parser. so that it's fixing here using tsql parser.

SPI_connect();

/* Query babelfish_servers catalog for data_source */
plan = SPI_prepare("SELECT data_source FROM sys.babelfish_servers WHERE name = $1",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no sys.babelfish_servers catalog. Have we defined one for this feature?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by removing it. It should be doing the same things at this step similar to existing function. Using the establish connection function now

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: relation "sys.babelfish_servers" does not exist)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be happening

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, fixed.

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: relation "sys.babelfish_servers" does not exist)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be happening

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, fixed.


-- Test 55: Procedure with INSERT statement (violates SELECT-only)
-- Expected: Should throw error - INSERT not allowed
EXEC bbf_rpe_server.master.dbo.sp_WithInsert;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have thrown error right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed by switching to tsql parser instead of static analysis

GO
~~START~~
int
2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed by switching to tsql parser instead of static analysis

GO
~~START~~
int
0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed by switching to tsql parser instead of static analysis

Oid bit_oid = (*common_utility_plugin_ptr->lookup_tsql_datatype_oid)("bit");

if (pgtype == varchar_oid)
return SYBVARCHAR;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid referencing direct FreeTDS type macros. Instead use or define macros for these in linked_server.h . This is necessary so that This is necessary to minimise code change if we ever decide to use another client library.

Also please ensure the code compiles with and without the ENABLE_TDS_LIB compiler flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined LS_TYPE_* abstraction macros in linked_servers.h (e.g., LS_TYPE_VARCHAR, LS_TYPE_INT4) with corresponding #else stubs, replacing all direct FreeTDS references like SYBVARCHAR.

Oid uniqueidentifier_oid = (*common_utility_plugin_ptr->lookup_tsql_datatype_oid)("uniqueidentifier");
Oid bit_oid = (*common_utility_plugin_ptr->lookup_tsql_datatype_oid)("bit");

if (pgtype == varchar_oid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use switch ... case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated get_tds_type_from_pg_oid() — Babelfish OIDs are runtime values so they use a single flat if-chain with category comments; standard PG OIDs use a proper switch/case.

}

/* Check for Babelfish-specific types first */
if (common_utility_plugin_ptr && common_utility_plugin_ptr->lookup_tsql_datatype_oid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be multiple instances of duplicated code. I think we can combine multiple conditionals here

Copy link
Copy Markdown
Contributor Author

@fizzlr fizzlr Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated 4 separate if (common_utility_plugin_ptr ...) blocks into a single block with all 23 Babelfish OID lookups in one place.

* Returns palloc'd buffer containing the data
*/
void
convert_datum_to_tds_bytes(Datum value, Oid valtype, int32 valtypmod, bool isnull,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be multiple instances of duplicated code. I think we can combine multiple conditionals in this function

procedure_name, strlen(definition));

/* Use ANTLR parser for accurate T-SQL validation */
validate_remote_procedure_select_only_antlr(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we throw error inside this function, will we free definition and query.data?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added null-safe pfree() calls for both definition and query.data on all paths, and initialized query.data = NULL so PG_FINALLY can safely check it.


-- Test 82: SQL Injection attempt in procedure name (should fail safely)
-- Expected: Should throw error - procedure not found (not SQL injection)
EXEC bbf_rpe_server.master.dbo.[sp_NoParams'; DROP TABLE test; --];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try injection in server name, database name and procedure name. Also, I don't think the procedure name injection is actually getting tested here. Having ' in a procedure name is perfectly valid in T-SQL if it wrapped in []

GO
~~START~~
varchar#!#int
Test ? ? ? ?#!#12
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should see the actual symbols

~~ERROR (Code: 33557097)~~

~~ERROR (Message: Remote procedure/function reference with 4-part object name is not currently supported in Babelfish)~~
~~ERROR (Message: improper qualified name (too many dotted names): bo.babel_5222_xml_exist_t1.xmlcolumn.exist)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message should not have been thrown

~~ERROR (Code: 33557097)~~

~~ERROR (Message: Remote procedure/function reference with 4-part object name is not currently supported in Babelfish)~~
~~ERROR (Message: improper qualified name (too many dotted names): bo.babel_5223_xml_value_t1.xmlcolumn.value)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message should not have been thrown

@@ -0,0 +1,1211 @@

-- ========================================
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run these or add new tests for upgrade

query_timeout INT,
connect_timeout INT
connect_timeout INT,
rpc_out BOOLEAN DEFAULT FALSE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can't change that directly, should use the upgrade script to alter the table add new columns
  2. Also in this script, we should also use alter table after create table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can modify this. This will only be run on a fresh provisioning of Babelfish database. This is not the upgrade script. The ALTER statement should be part of the upgrade script.

* otheriwse, a T-SQL info
* otherwise, a T-SQL info
*/
appendStringInfo(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why deleting those ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statements were development-time debugging logs that were producing excessive output in production PG log files. Every single remote procedure call was generating 40+ LOG-level messages (connection parameters, query text, row-by-row data, etc.).

I converted them to a two-tier debug strategy:

  • 19 retained as LINKED_SERVER_DEBUG(...) (maps to elog(DEBUG1, ...)) — visible only with SET client_min_messages TO debug1
  • 46 removed entirely — redundant messages (e.g., "about to call function X" followed by "called function X successfully")

The LINKED_SERVER_DEBUG macro is defined in linked_servers.h and matches the existing pattern used by the OPENQUERY code path.

));

/* LOG: Foreign server retrieved successfully */
elog(LOG, "DEBUG_LINKED_SERVER: Retrieved foreign server configuration for: %s", servername);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of those logs ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 46 development-time elog(LOG) statements; retained 19 as LINKED_SERVER_DEBUG() (only visible at DEBUG1 level) for troubleshooting.

* Returns List of NestedProcedureInfo structures
*/
static List *
extract_nested_procedure_calls(const char *definition)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example for what this function will return ?

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: RPC out is not enabled for the specified server. Use sp_serveroption to enable it.)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error msg is not 'server not found' , so it's different from expected

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: Could not fetch definition for remote procedure bbf_rpe_server.master.invalid_schema.sp_noparams)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is not 'schema not found'

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: Remote procedure contains UPDATE statement)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need change the error msg to Update not allowed ?

GO
~~ERROR (Code: 33557097)~~

~~ERROR (Message: Remote procedure contains DELETE statement)~~
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case like :
insert into local table with exec from a remote server, thanks

@fizzlr fizzlr reopened this Feb 26, 2026
@fizzlr fizzlr force-pushed the remote-proc-exec-wip branch from 0d772dd to f3bad22 Compare February 27, 2026 19:10
/* Skip during Babelfish restore */
if (!babelfish_dump_restore && (relationId == sysdatabases_oid ||
/* Skip during Babelfish restore or extension creation/upgrade */
if (!babelfish_dump_restore && !creating_extension && (relationId == sysdatabases_oid ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this change ?

bool rpc_out_enabled = false; /* Default to disabled */

bbf_servers_def_rel = table_open(get_bbf_servers_def_oid(),
RowExclusiveLock);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using RowExclusiveLock in here ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks good find, it's changed recently in https://github.com/babelfish-for-postgresql/babelfish_extensions/pull/4320/changes#diff-21156ac859506a645f7ccb05bfd9cde8e8c967aaa736531bdbca14de0f427044. That updated the lock of previous server option implementation. Will update.

* handles the character encoding internally when we use XSYBNVARCHAR/XSYBNCHAR
* types. We send UTF-8 data and FreeTDS converts it to UTF-16 LE as needed
* for the TDS protocol wire format.
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment useful in here ?

switch (error_code)
{
case 201: /* Missing or invalid parameter */
appendStringInfo(&buf, "Procedure parameter error: %s", clean_msg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error code mapping behavior the same as SQL Server ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, we need to make sure the error code mapping is the same as SQL Server.

get_tds_type_from_pg_oid(Oid pgtype)
{
/* Check for Babelfish-specific types (runtime OIDs, cannot use switch) */
if (common_utility_plugin_ptr && common_utility_plugin_ptr->lookup_tsql_datatype_oid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those code duplicates with code defined in typecodes.c

* single if-chain first. Standard PostgreSQL types use a switch statement.
*/
void
convert_datum_to_tds_bytes(Datum value, Oid valtype, int32 valtypmod, bool isnull,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to reuse the tds_typeio.h codes, not re-implement the convert datum code again.

RUI LUO added 12 commits March 14, 2026 00:40
…on for linked servers

Implement remote stored procedure execution for Babelfish linked servers
through TDS RPC calls with security validations. Key changes include:
- Add RPC call handling and parameter marshaling in linked servers related files
- Extend catalog functions for remote procedure metadata lookup
- Implement result processing and error handling for remote calls
- Add RPC out server option validation for security
- Add SELECT-only statement validation for remote procedures
- Update parser to detect and route remote procedure calls
…execution

Critical Security Fixes:
- Fix SQL injection in SELECT-only validation using sp_executesql with parameter binding
- Fix buffer overflow by replacing fixed 65KB buffer with dynamic StringInfo and chunked reading
- Prevent arbitrary SQL execution and handle nvarchar(max) procedure definitions safely

Critical Correctness Fixes:
- Fix incorrect "master" default database assumption
- Query babelfish_servers catalog for configured default database from data_source
- Parse 'Database=' or 'Initial Catalog=' parameter from connection string

Security Validation Improvements:
- Replace string-based validation with parser-based validation using raw_parser()
- Analyze procedure AST to detect forbidden DML/DDL/dynamic SQL operations
- Cannot be bypassed with comments or string literals

Connection Management:
- Use separate TDS connections for validation queries vs RPC execution
- Prevent FreeTDS error 20019 from mixing SQL and RPC on same connection
- Add proper LINKED_SERVER_CANCEL before LINKED_SERVER_CLOSE

Implementation Fixes:
- Fix use-after-free bug by deferring parameter buffer cleanup until after dbrpcsend()
- Implement get_tds_type_from_pg_oid() for RPC parameter type mapping
- Implement convert_datum_to_tds_bytes() for RPC parameter marshaling
- Handle Babelfish types: varchar/nvarchar, datetime types, numeric/binary types

Test Coverage:
- Add comprehensive JDBC test suite in test/JDBC/
- Test RPC out validation, SELECT-only enforcement, parameter binding
- Test UTF-16 handling, output parameters, various data types
… and comprehensive JDBC tests; and fixed a few bugs
- Add FreeTDS type abstraction layer for library independence
- Fix memory leak in validation error paths with PG_TRY/CATCH
- Fix XML method detection to exclude exist/value/query/nodes/modify
- Replace inadequate SQL injection test with comprehensive security tests
- Add 3-file upgrade test suite for 5.3.0 → 5.4.0
remote procedure execution feature via linked servers. Includes fixes for
return code propagation, memory safety, FreeTDS build abstraction, and
significant code consolidation.

Blocking Items Resolved:

- Return code propagation: Use dbretstatus() to capture the remote
  procedure's actual RETURN value instead of hardcoding 0. Store in
  pltsql_proc_return_code for the caller to read via LINKED_SERVER_RET_STATUS
  macro.

- Memory safety in validate_remote_procedure_recursive(): Initialize
  query.data to NULL before PG_TRY so PG_FINALLY can safely check it.
  Guard all pfree() calls with NULL checks to prevent double-free on
  error paths.

- FreeTDS abstraction layer: Add comprehensive no-op stubs in
  linked_servers.h for non-FreeTDS builds covering RPC operations
  (LINKED_SERVER_RPC_INIT/PARAM/SEND/EXEC), OUTPUT parameter retrieval
  (LINKED_SERVER_NUM_RETS/RET_NAME/RET_DATA/RET_LEN/RET_TYPE/RET_STATUS),
  TDS type constants (LS_TYPE_VARCHAR, LS_TYPE_INT4, etc.), and FreeTDS
  data type aliases (LS_DBFLT8, LS_DBINT, etc.).

- sp_serveroption: Accept "yes"/"no" in addition to "true"/"false" for
  rpc out option using pg_strcasecmp for case-insensitive comparison.

Code Quality Improvements:

- Debug logging cleanup: Remove 46 debug log statements, convert
  remaining 19 to DEBUG level using LINKED_SERVER_DEBUG macro instead
  of elog(LOG, ...).

- Code consolidation: Refactor get_tds_type_from_pg_oid() and
  convert_datum_to_tds_bytes() to eliminate duplicate type-mapping
  logic.
…ix type conversion errors returning unmapped code 33557097 by mapping 22P02 to 245. Add 20 RPC error compatibility tests covering parameter errors, RAISERROR, runtime errors, nested errors, and edge cases.
@fizzlr fizzlr force-pushed the remote-proc-exec-wip branch from 72f8882 to 17b2f9e Compare March 14, 2026 00:44
RUI LUO added 11 commits March 16, 2026 19:40
… split RPC test for timeout

- Remove #include "linked_servers.h" from tsqlIface.cpp to prevent FreeTDS
  sybdb.h macros (ON, FAIL, SUCCEED, TRUE, FALSE) from corrupting the
  ANTLR-generated parser. Use forward declaration of NestedProcedureInfo instead.
  Fixes xml_exist/xml_value test regressions across all 17 CI checks.

- Split remote-proc-exec-vu-verify (94s) into 3 parts under 40s each:
  vu (34s), xp2 (24s), xp3 (31s). xp prefix ensures correct alphabetical
  ordering after vu-prepare when CI uses 'all' keyword.
- Add '-- single_db_mode_expected' marker to remote-proc-exec-xp2-vu-verify.sql
  to activate the single_db/ expected output override in the JDBC test framework.
  Without this marker, checkSingleDbModeExpected stays false and the framework
  ignores the single_db/remote-proc-exec-xp2-vu-verify.out override file.

- Add remote-proc-exec-txn tests covering transaction isolation, error
  propagation, RPC out configuration, implicit transactions, and edge cases
  for remote procedure execution via linked servers.
Add visitTransaction_statement() to ANTLR RemoteProcedureSelectOnlyValidator
to block BEGIN TRAN, COMMIT, ROLLBACK, and SAVE TRAN in remote procedures.
- Add RPC error capture mode in msg_handler/err_handler to format remote
  errors as error 7215 with server name, msg number, level, state, line
- RPC OUT disabled errors now produce error 7412 with server name
- err_handler throws captured error immediately on DB 20018 to prevent
  partial result leaks from procedures that SELECT before RAISERROR
- Store linked server name in capture state for correct error attribution
  since FreeTDS svr_name callback returns hostname not linked server name
- OPENQUERY path completely unaffected as capture mode defaults to false
- Add JDBC test schedule entries and update expected outputs for all 12
  remote-proc-exec test files across main/single_db/chinese_prc_ci_as
- Update TestErrorHelperFunctions.out to include 3 new HV error mappings
  (HV004→7215, HV000→7412, HV000→7202) added by RPC error handling
- Update four-part-names-vu-verify.out (multi-db and single_db) to reflect
  new error code 7412 and message format for RPC OUT not enabled
- Update chinese_prc_ci_as collation override for remote-proc-exec-xp3 to
  use new SQL Server-compatible error format instead of raw TDS client format
…02/7215/7412

The TestErrorHelperFunctionsUpgrade-vu-verify.out expected output was
missing the 3 new error codes (7202, 7215, 7412) added to
error_mapping.txt for linked server RPC error handling. This caused
all 3 version upgrade tests involving v13 paths to fail.

Changes:
- Add 7202, 7215, 7412 to sorted error code list
- Add 7215, 7412, 7202 to both fn_mapped_system_error_list query results
- Update total error count from 167 to 170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants