|
| 1 | +# Deployment Timing Tests Analysis |
| 2 | + |
| 3 | +## Test Run: 20068613208 (LDEV-5955 verified) |
| 4 | + |
| 5 | +### Results Summary - ALL PASSING ✅ |
| 6 | + |
| 7 | +| Test | Result | |
| 8 | +|------|--------| |
| 9 | +| 6.2-snapshot-immediate | PASS | |
| 10 | +| 6.2-snapshot-delayed | PASS | |
| 11 | +| 6.2-snapshot-stress | PASS | |
| 12 | +| 7.0-snapshot-immediate | PASS | |
| 13 | +| 7.0-snapshot-delayed | PASS | |
| 14 | +| 7.0-snapshot-stress | PASS | |
| 15 | +| 7.1-native-immediate | PASS | |
| 16 | +| 7.1-native-delayed | PASS | |
| 17 | +| 7.1-native-stress | PASS | |
| 18 | + |
| 19 | +### Analysis |
| 20 | + |
| 21 | +**LDEV-5955 fix verified working** - startup hooks now execute across all versions. |
| 22 | + |
| 23 | +Fixes applied to 7.0: |
| 24 | +- LDEV-5957: Don't uninstall bundle after registering startup hook |
| 25 | +- LDEV-5955: Trigger `config.getStartups()` at load time + after extension install |
| 26 | + |
| 27 | +### Current Status |
| 28 | + |
| 29 | +Startup hooks work. Now we can investigate the actual deploy timing issue (LDEV-5960/LDEV-5478). |
| 30 | + |
| 31 | +The slow-startup extension has a 5-second sleep in its startup hook - this widens the timing window to expose race conditions where requests are served before extensions are ready. |
| 32 | + |
| 33 | +### Next Steps: LDEV-5960 Investigation |
| 34 | + |
| 35 | +Per PLAN-LDEV-5960-deploy-timing.md: |
| 36 | + |
| 37 | +1. **Find where queue mode changes** from BLOCKING to ENABLED in 7.0 |
| 38 | +2. **Ensure config load completes** (including deployed extensions) before mode change |
| 39 | +3. **Add debug logging** to trace timing between deploy and first request |
| 40 | +4. **Reproduce the bug** - currently tests pass because startup hooks block, but what if extension doesn't have a startup hook? |
| 41 | + |
| 42 | +Key code locations: |
| 43 | +- `ThreadQueueImpl.java` - queue mode constants |
| 44 | +- `ConfigServerImpl.java` - initial BLOCKING mode |
| 45 | +- `ConfigFactoryImpl.java:224` - where mode changes to ENABLED |
| 46 | +- `DeployHandler.java` - deploy folder processing |
| 47 | +- `CFMLEngineImpl.java` - startup sequence |
| 48 | + |
| 49 | +### Key Finding: ESAPI Has No Startup Hook |
| 50 | + |
| 51 | +**Root cause identified**: ESAPI extension doesn't have a startup hook! |
| 52 | + |
| 53 | +Flow: |
| 54 | + |
| 55 | +1. Extension deployed (files copied, bundle registered with Felix) |
| 56 | +2. Queue becomes ENABLED when first web context created (ConfigFactoryImpl:224) |
| 57 | +3. First request calls ESAPI function (e.g., `encodeForHTML`) |
| 58 | +4. `FunctionSupport` calls `PropertyDeployer.deployIfNecessary()` which sets system property |
| 59 | +5. **BUT** ESAPI's `DefaultSecurityConfiguration` static init may have already run |
| 60 | +6. Static init caches `org.owasp.esapi.resources` as null |
| 61 | +7. ESAPI throws "ConfigurationException: ESAPI.properties could not be loaded" |
| 62 | + |
| 63 | +**Why our tests pass**: Our slow-startup extension HAS a startup hook that blocks for 5 seconds. The hook runs synchronously during deploy, so by the time queue goes ENABLED, everything is ready. |
| 64 | + |
| 65 | +**The ESAPI fix options**: |
| 66 | + |
| 67 | +1. **Add startup hook to ESAPI extension** - call `PropertyDeployer.deployIfNecessary()` during startup |
| 68 | +2. **Use BundleActivator** - OSGi activator runs when bundle starts |
| 69 | +3. **Static initialiser in FunctionSupport** - ensure property is set before any ESAPI class loads |
| 70 | + |
| 71 | +### Test Plan: Reproduce Without Startup Hook |
| 72 | + |
| 73 | +Create a test extension WITHOUT a startup hook that: |
| 74 | + |
| 75 | +- Has a BIF that reads a system property |
| 76 | +- That system property is set by a lazy deployer (like ESAPI does) |
| 77 | +- Demonstrates race condition on first request |
| 78 | + |
| 79 | +This will prove the timing issue exists and give us a test case for fixing it. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +## Testing Process |
| 84 | + |
| 85 | +### Running Tests |
| 86 | + |
| 87 | +Tests run via GitHub Actions workflow `deployment-tests.yml`. |
| 88 | + |
| 89 | +To trigger a run, push to the repo or manually trigger the workflow. |
| 90 | + |
| 91 | +### Viewing Results |
| 92 | + |
| 93 | +1. Check workflow status: |
| 94 | + ```bash |
| 95 | + gh run list --repo lucee/lucee-testlab --workflow=deployment-tests.yml --limit 5 |
| 96 | + ``` |
| 97 | + |
| 98 | +2. Download artifacts (logs from failed tests): |
| 99 | + ```bash |
| 100 | + rm -rf D:/work/lucee-testlab/test-output/test-deployment-timing |
| 101 | + mkdir -p D:/work/lucee-testlab/test-output/test-deployment-timing |
| 102 | + gh run download <RUN_ID> --repo lucee/lucee-testlab --dir D:/work/lucee-testlab/test-output/test-deployment-timing |
| 103 | + ``` |
| 104 | + |
| 105 | +3. Inspect logs: |
| 106 | + ```bash |
| 107 | + ls D:/work/lucee-testlab/test-output/test-deployment-timing/ |
| 108 | + # Each failed job has a lucee-logs-<test-name> folder with: |
| 109 | + # - express/logs/catalina.out |
| 110 | + # - express/lucee-server/context/logs/*.log |
| 111 | + ``` |
| 112 | + |
| 113 | +### Test Matrix |
| 114 | + |
| 115 | +| Test Name | Extension | Has Startup Hook | Expected Result | |
| 116 | +|-----------|-----------|------------------|-----------------| |
| 117 | +| 6.2-snapshot-* | slow-startup | YES | PASS | |
| 118 | +| 7.0-snapshot-* | slow-startup | YES | PASS | |
| 119 | +| 7.1-native-* | slow-startup | YES | PASS | |
| 120 | +| 6.2-lazy-immediate | lazy-deploy | NO | ? (testing LDEV-5960) | |
| 121 | +| 7.0-lazy-immediate | lazy-deploy | NO | ? (testing LDEV-5960) | |
| 122 | +| 7.0-lazy-delayed | lazy-deploy | NO | ? (testing LDEV-5960) | |
| 123 | + |
| 124 | +### Local gitignore |
| 125 | + |
| 126 | +The `test-output/` directory is excluded via `.git/info/exclude` - don't commit test artifacts. |
0 commit comments