Last Updated: 2025-12-28
Status: Production Ready
Version: Unified Connection Model with Backend Session Pooling
- Overview
- XA Architecture
- Backend Session Pooling
- Configuration
- Dual-Condition Session Lifecycle
- Dynamic Pool Rebalancing
- Multinode XA Support
- Performance and Monitoring
- Troubleshooting
OJP provides comprehensive XA (distributed transaction) support with backend session pooling for PostgreSQL and other XA-capable databases. The implementation uses Apache Commons Pool 2 for managing XA backend sessions on the server side, while maintaining JDBC XA spec compliance on the client side.
- Backend Session Pooling: Server-side pooling of XAConnection and backend sessions using Apache Commons Pool 2
- Dual-Condition Lifecycle: Sessions returned to pool only when BOTH transaction complete AND OJP XAConnection closed
- Dynamic Pool Rebalancing: Automatic pool resizing based on cluster health and server failures
- Unified Connection Model: Both XA and non-XA connections connect to ALL servers for optimal failover
- Connection Reuse: Physical PostgreSQL connections stay open and reused across multiple XA transactions
-
Separation of Concerns:
- Application Side: Applications may use HikariCP or other pools to pool OJP JDBC connections
- Server Side: Apache Commons Pool 2 (XA) or HikariCP (non-XA) pools PostgreSQL backend sessions via SPIs
-
Pooling Contract:
- Pool's
makeObject()→open()creates physical PostgreSQL XAConnection - Pool's
passivateObject()→reset()cleans state, keeps connection open - Pool's
destroyObject()→close()destroys physical connection (eviction only)
- Pool's
-
XA Spec Compliance:
- Sessions support multiple sequential XA transactions
- Connection properties persist across transaction boundaries
- No connection recreation between transactions (JDBC XA requirement)
Client Application
↓
OjpXADataSource.getXAConnection()
↓
[gRPC Call] StatementService.connect(isXA=true)
↓
OJP Server - CommonsPool2XADataSource
↓ borrowSession()
Apache Commons Pool 2
↓ makeObject() / borrowObject()
PostgreSQL XAConnection (backend session)
↓
PostgreSQL Database
OjpXADataSource: Implementsjavax.sql.XADataSourceOjpXAConnection: Wraps server XA session, providesXAResourceandConnectionOjpXALogicalConnection: Logical connection from XAConnection, blocks direct commit/rollbackOjpXAResource: Delegates XA operations to server via gRPC
CommonsPool2XADataSource: Manages backend session pool using Apache Commons Pool 2XATransactionRegistry: Tracks active XA transactions and session lifecycleXABackendSession: Wraps PostgreSQL XAConnection with pooling lifecycle methodsStatementServiceImpl: Routes XA operations to appropriate backend sessions
PGXADataSource: Creates XA-capable connectionsPGXAConnection: PostgreSQL's XA connection implementationPGXAResource: PostgreSQL's XA resource for two-phase commit
-
Connection Establishment:
Client: OjpXADataSource.getXAConnection() → Server: borrowSession() from Commons Pool 2 → Pool: makeObject() if needed, or reuse idle session → Backend: Open or reuse PostgreSQL XAConnection -
XA Transaction Execution:
Client: xaResource.start(xid) → Server: Delegate to PostgreSQL XAResource.start(xid) → Execute SQL operations → Client: xaResource.end(xid) → Client: xaResource.prepare(xid) → Client: xaResource.commit(xid) -
Connection Release:
Client: xaConnection.close() → Server: Mark transaction complete, keep in registry → When BOTH conditions met (transaction complete AND XAConnection closed): Return session to pool → Pool: passivateObject() → reset() (clean state, keep connection open)
Problem: XA connections cannot use HikariCP (non-XA only) Solution: Apache Commons Pool 2 manages PostgreSQL XA backend sessions on server side
public class XABackendSession implements Poolable {
private XAConnection xaConnection; // PostgreSQL XAConnection
private Connection connection; // Regular connection from XAConnection
private XAResource xaResource; // PostgreSQL XAResource
// Pooling lifecycle methods
public void open() { /* Create XAConnection */ }
public void reset() { /* Clean state, keep connection */ }
public void close() { /* Destroy connection (eviction only) */ }
}Manages the pool with dynamic resizing:
public class CommonsPool2XADataSource {
private GenericObjectPool<XABackendSession> pool;
public XABackendSession borrowSession() { return pool.borrowObject(); }
public void returnSession(XABackendSession session) { pool.returnObject(session); }
// Dynamic resizing for failover
public void setMaxTotal(int maxTotal) { /* Resize pool */ }
public void setMinIdle(int minIdle) { /* Pre-warm connections */ }
}Default values (per server):
maxTotal: 11 (maximum active sessions)minIdle: 10 (pre-warmed idle sessions)maxWait: 20 seconds (borrow timeout)
Multinode division:
- 2 servers: 22 total → 11 per server
- Pool sizes automatically rebalance on server failures
-
Idle in Pool:
- Physical PostgreSQL XAConnection open
- Session wrapper in pool's idle queue
- Ready for immediate reuse
-
Active (Borrowed):
- Session in use by OJP XA session
- Executing XA transactions
- NOT yet returned to pool
-
Transaction Complete, XAConnection Open:
- Transaction committed/rolled back
- OJP XAConnection still open (client hasn't closed)
- Session stays in XA registry (not returned to pool)
-
Ready for Return:
- BOTH conditions met:
- Transaction complete (commit/rollback called)
- OJP XAConnection closed (client called close())
- Session returned to pool via
returnSession() - Pool calls
passivateObject()→reset()
- BOTH conditions met:
-
Evicted from Pool:
- Pool evicts idle session (max lifetime reached)
- Pool calls
destroyObject()→close() - Physical PostgreSQL XAConnection destroyed
Note: All pool configuration properties are documented in detail at documents/configuration/ojp-jdbc-configuration.md. This section provides a high-level overview of XA-specific configuration.
XA connections use separate pool configuration from non-XA connections. Configuration is provided by the client and applied on the server side when creating XA datasources.
For N servers with maxTotal=M:
- Each server gets: M / N sessions
- Example: 2 servers, maxTotal=22 → 11 per server
Pool sizes automatically rebalance when:
- Server fails → remaining servers expand to compensate for lost sessions on Server 1
- Server recovers → Both servers rebalance, dividing the total sessions among all servers in the cluster
Problem: Sessions were immediately returned to pool after commit(), but OJP XAConnection was still open. Next transaction on same OJP connection failed because backend session was already back in the pool.
Solution: Sessions only returned when BOTH conditions met:
- XA transaction complete (
commit()orrollback()called) - OJP XAConnection closed (client called
xaConnection.close())
1. Client: xaConnection = xaDataSource.getXAConnection()
→ Server: Borrow session from pool (idle → active)
2. Client: xaResource.start(xid)
→ Server: Register in XATransactionRegistry
→ Registry: Mark transactionComplete = false
3. Client: Execute SQL operations
4. Client: xaResource.end(xid), prepare(xid), commit(xid)
→ Server: Mark transactionComplete = true
→ Session STAYS in registry (OJP XAConnection still open)
5. Client: xaConnection.close()
→ Server: Call registry.returnCompletedSessions()
→ Registry: Find sessions where transactionComplete = true
→ Return those sessions to pool
→ Pool: passivateObject() → reset()
1. Client: xaConnection = xaDataSource.getXAConnection()
→ Server: Borrow session from pool
2. Client: Transaction 1
xaResource.start(xid1) → end() → commit()
→ Server: Mark transactionComplete = true
→ Session stays in registry (XAConnection still open)
3. Client: Transaction 2 (same XAConnection)
xaResource.start(xid2) → end() → commit()
→ Server: Mark transactionComplete = true again
→ Session stays in registry
4. Client: xaConnection.close()
→ Server: Return session to pool once
→ Connection reused across both transactions ✓
public class TxContext {
private String xid;
private String ojpSessionId; // OJP XAConnection UUID
private XABackendSession backendSession; // Pooled backend session
private boolean transactionComplete; // Set on commit/rollback
}public class XATransactionRegistry {
private ConcurrentHashMap<String, TxContext> transactions;
public void markTransactionComplete(String xid) {
TxContext ctx = transactions.get(xid);
ctx.setTransactionComplete(true);
// Session stays in registry
}
public void returnCompletedSessions(String ojpSessionId) {
// Called when OJP XAConnection closes
List<TxContext> completed = transactions.values().stream()
.filter(ctx -> ctx.getOjpSessionId().equals(ojpSessionId))
.filter(TxContext::isTransactionComplete)
.collect(Collectors.toList());
for (TxContext ctx : completed) {
poolProvider.returnSession(ctx.getBackendSession());
transactions.remove(ctx.getXid());
}
}
}┌─────────────────────────────────────────────────────────────────┐
│ Pool (Idle Sessions) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Session │ │ Session │ │ Session │ ... │
│ └─────────┘ └─────────┘ └─────────┘ │
└──────┬──────────────────────────────────────────────────────────┘
│ borrowSession()
↓
┌─────────────────────────────────────────────────────────────────┐
│ XA Transaction Registry (Active Transactions) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ TxContext: xid=tx1, ojpSessionId=uuid1, │ │
│ │ transactionComplete=false, backendSession=s1 │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ xaResource.commit() → transactionComplete = true │
│ Session STAYS in registry (XAConnection open) │
│ │
│ xaConnection.close() → returnCompletedSessions(uuid1) │
│ ↓ │
└────────┼─────────────────────────────────────────────────────────┘
│ returnSession(s1)
↓
┌─────────────────────────────────────────────────────────────────┐
│ Pool (Idle Sessions) - Session s1 returned │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Session │ │ Session │ │ Session │ │ s1 │ ... │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────┘
In multinode deployments, server failures require immediate pool resizing:
- Server 1 fails → Server 2 must expand to compensate for lost sessions on Server 1
- Server 1 recovers → Both servers rebalance, dividing the total sessions among all servers in the cluster
-
Cluster Health Changes:
- Health check detects server failure/recovery
processClusterHealth()called with updated health status- Pool resizing triggered for affected servers
-
Connection Establishment:
connect()called with cluster health information- Pools resized BEFORE borrowing session
- Prevents pool exhaustion on first connection attempt
-
XA Operations:
xaStart(),xaPrepare(),xaCommit()propagate cluster health- Ensures pools stay balanced during transaction execution
Pool resizing is handled by StatementServiceImpl.processClusterHealth() and CommonsPool2XADataSource classes. The algorithm divides the configured total pool size among healthy servers:
- Count healthy servers in the cluster
- Divide total maxTotal and minIdle by healthy server count
- Call
setMaxTotal()andsetMinIdle()on each healthy server's pool - Pre-warm idle connections to the new minIdle target
When setMinIdle() is called, the pool ensures maxIdle is updated and attempts to create idle connections immediately using preparePool(). If under load, a manual creation loop ensures connections are created even if the pool is actively being used.
Critical: processClusterHealth() called BEFORE borrowSession()
- Ensures pool resized before first connection attempt
- Prevents pool exhaustion cycle where:
- All sessions borrowed
- Pool exhausted
- Health check can't run (needs session)
- Pool never resizes
Implementation:
public XABackendSession connect(ConnectionDetails details) {
// 1. Process cluster health FIRST
processClusterHealth(details.getClusterHealth());
// 2. THEN borrow session (pool already resized)
return borrowSession();
}Scenario: 2 servers, maxTotal=22, minIdle=20
Initial State:
- Server 1: maxTotal=11, minIdle=10, idle=10
- Server 2: maxTotal=11, minIdle=10, idle=10
Server 1 Fails:
- Health check detects failure
processClusterHealth()called with Server 1=DOWN, Server 2=UP- Server 2 pool resized: maxTotal=22, minIdle=20
setMinIdle(20)pre-warms 10 additional connections- Server 2 now handles all traffic with 20 idle + capacity for 2 more active
Server 1 Recovers:
- Health check detects recovery (periodic 5-second heartbeat)
processClusterHealth()called with both servers UP- Both pools resized: maxTotal=11, minIdle=10
- Excess connections naturally evicted over time (idleTimeout)
Both XA and non-XA connections connect to ALL servers:
- Eliminates XA-specific connection path complexity
- Faster XA failover (sessions already on all servers)
- Consistent load balancing for all connection types
Uses SessionTracker for accurate load distribution:
public ServerEndpoint selectByLeastConnections() {
ServerEndpoint leastLoaded = null;
int minSessions = Integer.MAX_VALUE;
for (ServerEndpoint server : healthyServers) {
int sessionCount = sessionTracker.getSessionCount(server);
if (sessionCount < minSessions) {
minSessions = sessionCount;
leastLoaded = server;
}
}
return leastLoaded;
}Periodic Health Checks (every 5 seconds):
- Detect server failures/recoveries automatically
- No dependency on new connection attempts
- Trigger pool rebalancing on state changes
Health Check Implementation:
// Scheduled executor runs performHealthCheck() every 5 seconds
public boolean performHealthCheck(ServerEndpoint server) {
try {
// Lightweight heartbeat query
SessionInfo session = connect(server, heartbeatConnectionDetails);
terminateSession(session);
return true;
} catch (Exception e) {
return false; // Server unhealthy
}
}Distributed Transaction Example (2 databases, 2 OJP servers):
Transaction Manager (e.g., Narayana, Bitronix)
│
├─→ OJP Server 1 (XAResource 1)
│ ↓
│ PostgreSQL DB 1 (XA backend)
│
└─→ OJP Server 2 (XAResource 2)
↓
PostgreSQL DB 2 (XA backend)
1. TM: xaResource1.start(xid, branch1)
2. TM: xaResource2.start(xid, branch2)
3. Execute SQL on both branches
4. TM: xaResource1.end() → prepare() → commit()
5. TM: xaResource2.end() → prepare() → commit()
Supported Transaction Managers: Narayana, Bitronix, and other JTA-compliant transaction managers.
Note on Atomikos: Atomikos is currently not supported due to its architecture requiring pooled XAConnections at the application side. OJP manages all pooling on the server side, so XA connections obtained by the application must be unpooled (open/close directly without intermediate pooling). Atomikos does not support unpooled XA connections in its free version, making it incompatible with OJP's server-side pooling model.
Each OJP server:
- Manages its own backend session pool
- Delegates XA operations to PostgreSQL XAResource
- Returns sessions to pool after transaction completes
Enable DEBUG logging for pool diagnostics:
-Dorg.slf4j.simpleLogger.log.org.openjproxy=DEBUGLog Output:
XA pool initialized: server=localhost:5432, maxTotal=11, minIdle=10
Session borrowed successfully: poolState=[active=2, idle=8, maxTotal=11]
Session returned to pool: poolState=[active=1, idle=9, maxTotal=11]
Pool resized due to cluster health change: maxTotal=11→22, minIdle=10→20
Pre-warmed 10 idle connections successfully
Connection Reuse:
- ✅ Physical PostgreSQL XAConnection stays open across transactions
- ✅ No connection recreation overhead between transactions
- ✅ Session reset (10-50ms) vs. connection creation (100-500ms)
Pool Pre-Warming:
- ✅ Idle connections pre-created at startup (minIdle)
- ✅ Zero latency on first borrow (connection already open)
- ✅ Automatic expansion during failover (setMinIdle pre-warms immediately)
Load Distribution:
- ✅ SessionTracker provides accurate session counts
- ✅ Load-aware selection distributes evenly (15-20% better than round-robin)
- ✅ All connection types use same load algorithm
Apache Commons Pool 2 exposes JMX metrics:
org.openjproxy:type=CommonsPool2XADataSource,name=localhost:5432
- NumActive: Current active (borrowed) sessions
- NumIdle: Current idle sessions in pool
- MaxTotal: Maximum pool size
- MinIdle: Minimum idle sessions maintained
- MaxWaitMillis: Borrow timeout
Symptom:
SQLException: [XA-POOL-BORROW] POOL EXHAUSTED: maxTotal=11, active=11, idle=0, timeout=20000ms
Causes:
- Too many concurrent XA transactions
- Sessions not being returned (connection leak)
- maxTotal too low for workload
Solutions:
- Increase
ojp.xa.connection.pool.maxTotal - Check for connection leaks (sessions not closed in finally blocks)
- Review connection timeout settings
Symptom:
SQLException: Session not opened
Cause: This was a bug in OJP versions before dual-condition lifecycle fix (commit c5492f5)
Fixed In: Version with dual-condition session lifecycle
- Sessions no longer closed prematurely
- Pool only contains open, healthy sessions
Symptom: After server restarts, connection count drops to zero
Causes:
- Health check not detecting server recovery
- Pool not being re-initialized after recovery
Fixed In: Periodic health checker (every 5 seconds)
- Automatic recovery detection
- Pool rebalancing triggered on recovery
Symptom: Long delay before connections work after server failure
Causes:
- Pool rebalancing after borrowSession() (too late)
- Health check not propagating cluster state
Fixed In: Cluster health propagation during connect()
- Pool resized BEFORE borrowing session
- Immediate failover response
Enable comprehensive XA debugging:
# Core OJP logging
-Dorg.slf4j.simpleLogger.log.org.openjproxy=DEBUG
# Reduce ResultSet verbosity
-Dorg.slf4j.simpleLogger.log.org.openjproxy.jdbc.ResultSet=INFOLog Analysis:
# Pool operations
grep "Session borrowed" server.log
grep "Session returned" server.log
grep "Pool resized" server.log
# Transaction lifecycle
grep "XA transaction" server.log
grep "transactionComplete" server.log
# Health checks
grep "Health check" server.log
grep "Server.*UP" server.log
Verify Pool State:
-- PostgreSQL: Check active XA connections
SELECT * FROM pg_stat_activity WHERE backend_type = 'client backend';
-- Check prepared transactions (should be 0 after commit)
SELECT * FROM pg_prepared_xacts;Verify Session Counts:
// Enable SessionTracker logging
sessionTracker.getSessionCount(server) // Should match pool.getNumActive()Old Configuration (deprecated, no longer supported):
ojp.xa.maxTransactions=50
ojp.xa.startTimeoutMillis=60000New Configuration:
# Backend session pool configuration
ojp.xa.connection.pool.maxTotal=22
ojp.xa.connection.pool.minIdle=20
ojp.xa.connection.pool.connectionTimeout=20000Changes:
- ❌ Removed:
ojp.xa.maxTransactions(replaced by pool-based concurrency control) - ❌ Removed:
ojp.xa.startTimeoutMillis(no longer needed) - ✅ Added: Full pool configuration for XA backend sessions
- ✅ Added: Dual-condition session lifecycle
- ✅ Added: Dynamic pool rebalancing
No client code changes required!
XA application code remains identical:
// Same code works with both old and new implementation
XADataSource xaDataSource = new OjpXADataSource(url, user, password);
XAConnection xaConnection = xaDataSource.getXAConnection();
XAResource xaResource = xaConnection.getXAResource();
Connection connection = xaConnection.getConnection();
// Execute XA transaction
xaResource.start(xid, XAResource.TMNOFLAGS);
connection.createStatement().execute("INSERT INTO...");
xaResource.end(xid, XAResource.TMSUCCESS);
xaResource.prepare(xid);
xaResource.commit(xid, false);
xaConnection.close(); // Now properly returns session to poolTest Checklist:
- XA transactions complete successfully
- Multiple sequential transactions on same XAConnection work
- Connections properly released (check pool metrics)
- Failover works (kill server, verify pool resizing)
- No connection leaks (monitor for 1+ hours)
- XA Transaction Flow - Detailed XA operation flow diagrams
- Multinode Architecture - Overall multinode design
- JDBC Configuration - Client-side configuration
- Adding XA Support - Add XA for new databases
Last Updated: 2025-12-28
Implemented In: Commit c5492f5 and later
Test Coverage: MultinodeXAIntegrationTest (300+ queries, server failure/recovery cycles)