Skip to content

Conversation

Copy link

Copilot AI commented Feb 5, 2026

RRD4j tests fail in CI with -T1.5C (6 parallel modules) due to hardcoded 1-second sleep being insufficient under disk I/O and CPU contention from async storage operations.

Changes

Replace fixed sleep with adaptive polling:

  • Poll every 250ms for data availability, return immediately when ready
  • 20-second timeout handles CI contention (20× increase from 1s)
  • Log actual storage timing and attempt count for CI diagnostics
  • Detailed failure messages explain contention issues

Implementation:

  • Add waitForStorage(itemName, timeoutMs) method that queries persistence until data available
  • Update 4 test methods: storeAndRetrieveNumberValue, storeAndRetrieveSwitchValue, queryWithTimeRange, storeAndRetrieveWithInvalidDBConfig

Before:

service.store(numberItem);
Thread.sleep(STORAGE_TIMEOUT_MS);  // Fixed 1s wait, fails under contention

After:

service.store(numberItem);
waitForStorage(numberItem.getName(), STORAGE_TIMEOUT_MS);  // Polls until ready, up to 20s

Result: Tests complete faster in normal conditions while handling CI parallel execution robustly.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • openhab.jfrog.io
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --enable-native-access=ALL-UNNAMED -classpath /usr/share/apache-maven-3.9.12/boot/plexus-classworlds-2.9.0.jar -Dclassworlds.conf=/usr/share/apache-maven-3.9.12/bin/m2.conf -Dmaven.home=/usr/share/apache-maven-3.9.12 -Dlibrary.jansi.path=/usr/share/apache-maven-3.9.12/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/openhab-addons/openhab-addons org.codehaus.plexus.classworlds.launcher.Launcher clean test -q (dns block)
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --enable-native-access=ALL-UNNAMED -classpath /usr/share/apache-maven-3.9.12/boot/plexus-classworlds-2.9.0.jar -Dclassworlds.conf=/usr/share/apache-maven-3.9.12/bin/m2.conf -Dmaven.home=/usr/share/apache-maven-3.9.12 -Dlibrary.jansi.path=/usr/share/apache-maven-3.9.12/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/openhab-addons/openhab-addons org.codehaus.plexus.classworlds.launcher.Launcher spotless:check (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Problem

The RRD4j persistence tests are failing in CI but passing locally. The CI build uses -T1.5C which runs up to 6 modules in parallel, causing resource contention.

Failing CI Job: https://github.com/openhab/openhab-addons/actions/runs/21686302162/job/62534160911

Root Cause:

  • Tests use Thread.sleep(1000) to wait for async storage
  • In CI with parallel execution, disk I/O and CPU are heavily contested
  • 1 second is insufficient when 6+ modules are testing simultaneously
  • Tests pass locally because there's no resource contention

Solution

Replace the hardcoded Thread.sleep(1000) with a robust polling mechanism that:

  1. Polls for data availability every 250ms
  2. Has a 20-second timeout for CI environments
  3. Logs timing information to debug CI performance
  4. Provides clear failure messages explaining contention issues

Changes Required

Update bundles/org.openhab.persistence.rrd4j/src/test/java/org/openhab/persistence/rrd4j/internal/RRD4jPersistenceServiceTest.java:

Add Constants

private static final long STORAGE_TIMEOUT_MS = 20000; // 20 seconds for CI
private static final long POLL_INTERVAL_MS = 250; // Check every 250ms

Add Polling Method

/**
 * Waits for data to be persisted by polling the database.
 * This is more robust than Thread.sleep() in CI environments with resource contention.
 * 
 * @param itemName the name of the item to check
 * @param timeoutMs maximum time to wait in milliseconds
 * @throws InterruptedException if interrupted while waiting
 */
private void waitForStorage(String itemName, long timeoutMs) throws InterruptedException {
    long startTime = System.currentTimeMillis();
    int attempts = 0;
    
    while (System.currentTimeMillis() - startTime < timeoutMs) {
        attempts++;
        
        FilterCriteria criteria = new FilterCriteria();
        criteria.setItemName(itemName);
        criteria.setPageSize(1);
        
        try {
            Iterable<HistoricItem> results = service.query(criteria);
            if (results.iterator().hasNext()) {
                long elapsed = System.currentTimeMillis() - startTime;
                logger.info("Storage completed for '{}' after {}ms ({} attempts)", 
                           itemName, elapsed, attempts);
                return; // Success!
            }
        } catch (Exception e) {
            // Query might fail if data not ready yet, continue polling
            logger.debug("Query attempt {} failed: {}", attempts, e.getMessage());
        }
        
        Thread.sleep(POLL_INTERVAL_MS);
    }
    
    long elapsed = System.currentTimeMillis() - startTime;
    fail(String.format(
        "Data for item '%s' was not persisted within %dms (%d polling attempts). " +
        "This likely indicates CI resource contention with parallel test execution (-T1.5C). " +
        "Consider increasing STORAGE_TIMEOUT_MS or running tests sequentially.",
        itemName, elapsed, attempts
    ));
}

Replace All Thread.sleep(STORAGE_TIMEOUT_MS) Calls

In all test methods (e.g., storeAndRetrieveNumberValue(), storeAndRetrieveStringValue(), etc.), replace:

service.store(numberItem);
Thread.sleep(STORAGE_TIMEOUT_MS);

With:

service.store(numberItem);
waitForStorage(numberItem.getName(), STORAGE_TIMEOUT_MS);

Add Logger Field

private final Logger logger = LoggerFactory.getLogger(RRD4jPersistenceServiceTest.class);

Add import:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Important Constraints

  • DO NOT modify pom.xml - keep build configuration as-is
  • This is a test-only change - no production code modifications
  • Maintain all existing test assertions and behavior
  • The solution must work with -T1.5C parallel builds

Expected Outcome

  • Tests pass reliably in CI with parallel execution
  • Helpful logging shows actual storage timing
  • Clear failure messages if timeout is still insufficient
  • No changes to production code or build configuration

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

… tests

- Add STORAGE_TIMEOUT_MS constant (20 seconds for CI)
- Add POLL_INTERVAL_MS constant (250ms polling interval)
- Add Logger field and imports (SLF4J Logger/LoggerFactory)
- Add fail import for better assertions
- Implement waitForStorage() method that polls for data availability
- Replace all Thread.sleep calls with waitForStorage in 4 test methods
- Add detailed logging and failure messages for CI diagnostics

Co-authored-by: holgerfriedrich <[email protected]>
Copilot AI changed the title [WIP] Fix RRD4j persistence tests for CI environment Fix RRD4j persistence tests for parallel CI execution Feb 5, 2026
Copilot AI requested a review from holgerfriedrich February 5, 2026 05:39
holgerfriedrich added a commit to holgerfriedrich/openhab-addons that referenced this pull request Feb 6, 2026
Refs: openhab#20181

Signed-off-by: Holger Friedrich <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants