ADR-001: Transaction Isolation Strategy for Search Index Updates

Status: ✅ Implemented (Fixed 2026-02-13) Date: 2025-12-12 Implementation Date: 2025-12-17 Fix Date: 2026-02-13 Decision Makers: Development Team, Architecture Review Board Related Issues: Finding 2.1 (CRITICAL), FK Table Indexing Bug Implementation Commits: 378ec8b, 3026c09 Fix Commits: [pending]

Context

The current SearchIndexEventHandler reuses the Persistent Object's transaction context for search index updates. This creates a tight coupling between business logic transactions and index maintenance, violating iDempiere best practices and causing data inconsistency issues.

Current Implementation

// SearchIndexEventHandler.java:107
trxName = eventPO.get_TrxName();  // Reuses PO's transaction

// SearchIndexEventHandler.java:197
provider.createIndex(ctx, builder.build().getData(false), trxName);

Problems

Index failure blocks business operations
- If index update fails, entire business transaction rolls back
- Example: Product save fails because Elasticsearch is down
Transaction rollback causes index staleness
- Business logic rolls back → index update also rolls back
- Index not updated even though error was transient
Long-running transactions
- Index updates hold database locks
- Reduces system throughput
Violation of Single Responsibility Principle
- Business logic transaction responsible for index maintenance
- Tight coupling between concerns

Decision

We will implement Hybrid Transaction Strategy for search index operations:

Use business transaction for reading source data (to see uncommitted changes)
Use separate transaction for writing to index tables (to isolate failures)

Strategy: Hybrid Transaction Approach

@Override
protected void doHandleEvent(Event event) {
    PO eventPO = getPO(event);
    String businessTrxName = eventPO.get_TrxName();  // For reading data

    // Phase 1: Read data using business transaction (see uncommitted changes)
    PO[] mainPOArr = getMainPOs(eventPO, indexedTables, ctx, businessTrxName);

    // Phase 2: Write index using separate transaction (isolate failures)
    Trx indexTrx = Trx.get(Trx.createTrxName("SearchIdx"), true);
    try {
        String indexTrxName = indexTrx.getTrxName();

        SearchIndexConfigBuilder builder = new SearchIndexConfigBuilder()
            .setTrxName(businessTrxName)  // Still uses business trx for reading
            .build();

        // Perform index write with separate transaction
        provider.createIndex(ctx, builder.getData(false), indexTrxName);

        indexTrx.commit();
    } catch (Exception e) {
        indexTrx.rollback();
        log.log(Level.SEVERE, "Failed to update search index", e);
        // Don't throw - index failure shouldn't fail original transaction
    } finally {
        indexTrx.close();
    }
}

Consequences

Positive

Business logic independence
- Product saves successfully even if index update fails
- Resilient to external service failures (Elasticsearch down)
Better error handling
- Index errors logged but don't propagate
- Retry mechanisms possible
Improved performance
- Shorter transaction duration
- Reduced lock contention
iDempiere best practice compliance
- Event handlers use separate transactions
- Follows Observer pattern correctly

Negative

Eventual consistency
- Small time window where data and index out of sync
- Mitigation: Immediate separate transaction minimizes window
Complexity
- Additional transaction management code
- Need retry logic for transient failures
Testing requirements
- Must test rollback scenarios
- Need integration tests for consistency

Alternatives Considered

Alternative 1: Asynchronous Event Queue

EventQueue.getInstance().queue(new IndexUpdateEvent(
    searchIndex.getAD_SearchIndex_ID(),
    po.get_Table_ID(),
    po.get_ID()
));

Pros:

Zero impact on business transaction performance
Natural retry mechanism
Batch processing possible

Cons:

Larger consistency window (seconds to minutes)
Requires background worker infrastructure
More complex failure recovery

Decision: Rejected for initial implementation due to complexity. Consider for future enhancement if immediate separate transaction proves insufficient.

Alternative 2: Compensating Transaction

// Original transaction commits
// If index update fails, compensate by marking record for re-index

Pros:

Guaranteed eventual consistency
Clear audit trail

Cons:

Complex compensation logic
Additional database tables required

Decision: Rejected as over-engineered for current requirements.

Alternative 3: Two-Phase Commit (2PC)

Use distributed transaction coordinator to ensure atomicity across database and index.

Pros:

Strong consistency guarantee

Cons:

Performance penalty (2x slower)
Not supported by Elasticsearch
Overkill for search index use case

Decision: Rejected - search index is not critical enough to require 2PC.

Implementation Plan

Phase 1: Refactor Event Handler (2 days)

Modify SearchIndexEventHandler.java

// Remove instance variables (thread safety issue)
- private Properties ctx = null;
- private String trxName = null;

// Use local variables in doHandleEvent()
+ Properties ctx = Env.getCtx();
+ Trx indexTrx = Trx.get(Trx.createTrxName("SearchIdx"), true);

Add transaction helper methods

private void executeIndexUpdate(ISearchIndexProvider provider, ...) {
    Trx indexTrx = Trx.get(Trx.createTrxName("SearchIdx"), true);
    try {
        provider.createIndex(..., indexTrx.getTrxName());
        indexTrx.commit();
    } catch (Exception e) {
        indexTrx.rollback();
        handleIndexError(e);
    } finally {
        indexTrx.close();
    }
}

Implement error handling

private void handleIndexError(Exception e) {
    log.log(Level.SEVERE, "Search index update failed", e);

    // Mark for re-index (future: use queue)
    // For now: log only, don't block
}

Phase 2: Testing (1 day)

Unit tests
- Test transaction isolation
- Test rollback scenarios
- Test error handling
Integration tests
- Test PO save + index update success
- Test PO save success + index update failure
- Test PO save failure (index update should not occur)
Stress tests
- 100+ concurrent PO changes
- Verify no transaction deadlocks

Phase 3: Migration (0.5 days)

No database schema changes required
Deploy as drop-in replacement
Monitor transaction logs

Post-Implementation Issues & Fixes

Issue: FK Table Indexing Failure (2026-02-13)

Problem: Initial implementation used null transaction for ALL operations, including reading parent records:

// BROKEN (378ec8b):
PO[] mainPOArr = getMainPOs(eventPO, indexedTables, ctx, null);  // Can't see uncommitted data

Symptom:

GenericPO.load: NO Data found for M_Product_ID=11330103

Root Cause: When import process creates M_Product + M_Product_PO in same transaction:

M_Product created (not committed)
M_Product_PO saved → fires event
Event handler tries to load M_Product with null transaction
M_Product not visible → exception → business transaction fails

Fix: Use business transaction for reading, separate transaction for writing:

// FIXED:
String businessTrxName = eventPO.get_TrxName();
PO[] mainPOArr = getMainPOs(eventPO, indexedTables, ctx, businessTrxName);  // Can see uncommitted data

Lesson Learned: "Separate transaction" only applies to index WRITES, not data READS.

Monitoring & Validation

Success Criteria

✅ Business transactions complete in <500ms (no regression)
✅ Index updates complete in <1s (separate transaction)
⚠️ Zero business transaction failures due to index errors (fixed 2026-02-13)
✅ Index consistency >99.9% (measured by reconciliation job)

Metrics to Track

Transaction Duration

-- Business transaction (should decrease)
SELECT AVG(duration_ms) FROM trx_log WHERE trx_name LIKE 'PO%'

-- Index transaction (new metric)
SELECT AVG(duration_ms) FROM trx_log WHERE trx_name LIKE 'SearchIdx%'

Index Error Rate

SELECT COUNT(*) FROM ad_changelog
WHERE ChangeLog LIKE '%Search index update failed%'
GROUP BY TRUNC(Created, 'DD')

Consistency Check

-- Count records not in index
SELECT COUNT(*) FROM M_Product p
WHERE NOT EXISTS (
    SELECT 1 FROM searchindex_product si
    WHERE si.record_id = p.M_Product_ID
      AND si.ad_client_id = p.AD_Client_ID
)

Alerting

Alert if index error rate >1% per hour
Alert if consistency check finds >10 missing records
Alert if transaction duration >5s (p95)

Rollback Plan

If separate transaction isolation causes issues:

Immediate rollback

# Revert to previous bundle version
osgi> uninstall com.cloudempiere.searchindex
osgi> install file:///path/to/old/version.jar
osgi> start <bundle-id>

Re-enable shared transaction (emergency only)

// Revert to old code (keep as commented fallback)
// trxName = eventPO.get_TrxName();

Run index reconciliation

-- Mark all indexes for rebuild
UPDATE AD_SearchIndex SET IsValid='N'

References

iDempiere Event Handler Best Practices: http://wiki.idempiere.org/en/Event_Handler
Transaction Management Guide: http://wiki.idempiere.org/en/Transaction
CLAUDE.md:178-195 (Known Issues - Transaction Management)

Decision

Approved: [Pending] Approved By: [Name] Approval Date: [Date] Implementation Target: Week 2 (Phase 1)

Next ADR: ADR-002: SQL Injection Prevention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-001: Transaction Isolation Strategy for Search Index Updates

Context

Current Implementation

Problems

Decision

Strategy: Hybrid Transaction Approach

Consequences

Positive

Negative

Alternatives Considered

Alternative 1: Asynchronous Event Queue

Alternative 2: Compensating Transaction

Alternative 3: Two-Phase Commit (2PC)

Implementation Plan

Phase 1: Refactor Event Handler (2 days)

Phase 2: Testing (1 day)

Phase 3: Migration (0.5 days)

Post-Implementation Issues & Fixes

Issue: FK Table Indexing Failure (2026-02-13)

Monitoring & Validation

Success Criteria

Metrics to Track

Alerting

Rollback Plan

References

Decision

FilesExpand file tree

adr-001-transaction-isolation.md

Latest commit

History

adr-001-transaction-isolation.md

File metadata and controls

ADR-001: Transaction Isolation Strategy for Search Index Updates

Context

Current Implementation

Problems

Decision

Strategy: Hybrid Transaction Approach

Consequences

Positive

Negative

Alternatives Considered

Alternative 1: Asynchronous Event Queue

Alternative 2: Compensating Transaction

Alternative 3: Two-Phase Commit (2PC)

Implementation Plan

Phase 1: Refactor Event Handler (2 days)

Phase 2: Testing (1 day)

Phase 3: Migration (0.5 days)

Post-Implementation Issues & Fixes

Issue: FK Table Indexing Failure (2026-02-13)

Monitoring & Validation

Success Criteria

Metrics to Track

Alerting

Rollback Plan

References

Decision