╔════════════════════════════════════════════════════════════════════════════╗
║ SESSION 36: EXECUTION CHECKLIST ║
║ Real-time tracking for DO/CHECK/ACT phases ║
║ March 6, 2026 - LIVE ║
╚════════════════════════════════════════════════════════════════════════════╝
- Infrastructure readiness reviewed
- Container App access verified
- All team members online
- Slack #eva-deployment active
- Monitoring windows open
Status: Ready to start Owner: DevOps Lead ETA: 15 minutes
Reference: PHASE-3-DO-INTEGRATION-GUIDE.md → Task 1
# Execute in terminal
cd C:\AICOE\eva-foundry\37-data-model
# Step 1: Deploy Redis
.\scripts\deploy-redis-infrastructure.ps1
# Expected output:
# ✅ Redis instance created: myredis.redis.cache.windows.net
# ✅ SKU: Standard C1 (1GB)
# ✅ Connection string capturedCheckpoint 1.1: Redis Deployment
- Script execution started
- Redis instance in progress
- ETA to completion: _____ (note timestamp)
Checkpoint 1.2: Credentials Capture
- Redis host: ___________________
- Redis key: ___________________
- Shared with Backend Lead: ✅
Checkpoint 1.3: Verification
- Health check passed
- Container App secrets updated
- Redis connectivity confirmed
Task 1 Decision Gate
- ✅ GO - All checkpoints passed, proceed to Task 2
-
⚠️ CONDITIONAL - Minor issues, can proceed with notes: ___________ - ❌ NO-GO - Blocker found: ___________
Task 1 Complete Time: ___________ ET
Reference: PHASE-3-DO-INTEGRATION-GUIDE.md → Task 2
Checkpoint 2.1: Template Generation
- Router wrapper templates created
- All 41 layer routers processed
- Code compiles without errors
# Verify: Can import cache adapters
python -c "from api.cache import CachedLayerRouter; print('✅ Imports OK')"Output: ___________
Checkpoint 2.2: main.py Integration
- Cache initialization code added
- FastAPI lifecycle events configured
- Feature flags set correctly
# Verify: Check main.py has cache setup
grep -n "initialize_cache" main.py
grep -n "CacheManager" main.pyLines: ___________
Checkpoint 2.3: Local Testing
- CACHE_ENABLED=true environment variable set
- Local server starts without errors
- Health endpoints responding
$env:CACHE_ENABLED="true"
python -m uvicorn main:app --reload --port 8000
# In another terminal:
curl http://localhost:8000/healthResponse: ___________
Checkpoint 2.4: Code Review
- Peer review completed (2 reviewers min)
- All comments resolved
- Approved for deployment
Reviewers:
Task 2 Decision Gate
- ✅ GO - Code ready for staging
-
⚠️ CONDITIONAL - Minor fixes needed: ___________ - ❌ NO-GO - Significant issues: ___________
Task 2 Complete Time: ___________ ET
Reference: PHASE-3-DO-INTEGRATION-GUIDE.md → Task 3
Checkpoint 3.1: Docker Build
- Docker build started
- Build completed successfully
- Image size: _________ MB
- Image tag: eva/eva-data-model:cache-[DATE]
Docker Build Command:
docker build -t eva/eva-data-model:20260306-cache .
docker images | grep eva-data-modelCheckpoint 3.2: Registry Push
- Image pushed to registry
- Registry verification complete
- Pull from staging registry successful
Checkpoint 3.3: Staging Deployment
- Container App updated
- Environment variables configured
- Deployment in progress
az containerapp update `
-g EVA-Sandbox-dev `
-n msub-eva-data-model-staging `
--image eva/eva-data-model:20260306-cacheCheckpoint 3.4: Health Verification
- Staging endpoint responding
- Cache layer initialized
- No startup errors observed
curl https://msub-eva-data-model-staging.azurecontainerapps.io/healthResponse: ___________
Checkpoint 3.5: Cache Warming
- 5 minutes of test traffic
- Cache hit rate observed: ________%
- Baseline latencies collected
Task 3 Decision Gate
- ✅ GO - Staging ready for integration tests
-
⚠️ CONDITIONAL - Minor issues: ___________ - ❌ NO-GO - Deployment failed: ___________
Task 3 Complete Time: ___________ ET
Reference: PHASE-3-DO-INTEGRATION-GUIDE.md → Task 4
Checkpoint 4.1: Test Execution
- pytest tests/test_cache_integration.py running
- All 8 tests collected
pytest tests/test_cache_integration.py -v
# Expected:
# test_get_with_cache PASSED
# test_create_invalidates_cache PASSED
# [... 6 more]Test Results: ___________
Checkpoint 4.2: Performance Metrics
- Load test executed (50 concurrent)
- P50 latency: _________ ms
- Error rate: _________ %
Checkpoint 4.3: Baseline Report
- Metrics collected
- Compared to staging baseline
- No regressions detected
Task 4 Decision Gate
- ✅ GO - All tests passing, proceed to production prep
-
⚠️ CONDITIONAL - Minor test failures: ___________ - ❌ NO-GO - Critical test failures: ___________
Task 4 Complete Time: ___________ ET
Reference: PHASE-3-DO-INTEGRATION-GUIDE.md → Task 5
Checkpoint 5.1: Feature Flags
- CACHE_ENABLED=true set
- REDIS_ENABLED=true set
- ROLLOUT_PERCENTAGE=10 set (for Stage 1)
Checkpoint 5.2: Rollback Testing
# Test 1: Disable cache
$env:CACHE_ENABLED="false"
# Verify: App works without cache
curl https://production.../health
# Test 2: Re-enable
$env:CACHE_ENABLED="true"- Rollback test 1 passed
- Rollback test 2 passed
- Rollback scripts verified
Checkpoint 5.3: Leadership Review
- Product manager approval: ___________
- Engineering lead sign-off: ___________
- DevOps approval: ___________
- Go/no-go decision: GO ✅
Task 5 Decision Gate
- ✅ GO - All systems ready for CHECK phase
- ❌ NO-GO - Issues found: ___________
Task 5 Complete Time: ___________ ET
Reference: PHASE-3-CHECK-VALIDATION-GUIDE.md → Gate 1
Criteria to Verify:
python tests/test_preintegration_imports.py
# Must show:
# ✅ All 20+ cache components imported successfully
# ✅ CacheManager singleton working
# ✅ MemoryCache operations working- Test 1: All imports successful
- Test 2: CacheManager singleton
- Test 3: MemoryCache operations
- Test 4: All exports available
Gate 1 Result:
- ✅ PASS - Proceed to Gate 2
- ❌ FAIL - Fix and retest
- Issue: ___________
- Fix ETA: ___________
Gate 1 Complete Time: ___________ ET
Reference: PHASE-3-CHECK-VALIDATION-GUIDE.md → Gate 2
Test Execution:
pytest tests/test_integration_validation.py -v
# Must pass:
# test_cached_router_creation
# test_cache_flow_get
# test_cache_flow_create
# test_fastapi_startup_integration
# test_health_endpointCriteria:
- Router creation successful
- GET cache flow working
- CREATE invalidation working
- FastAPI integration functional
- Health endpoints responding
Gate 2 Result:
- ✅ PASS - Proceed to Gate 3
- ❌ FAIL - Fix and retest
- Issue: ___________
- Fix ETA: ___________
Gate 2 Complete Time: ___________ ET
Reference: PHASE-3-CHECK-VALIDATION-GUIDE.md → Gate 3
Load Test:
python scripts/validate-performance.py https://msub-eva-data-model-staging... Expected Results:
| Metric | Target | Actual | Status |
|---|---|---|---|
| P50 Latency | <100ms | ___ms | [ ] |
| P95 Latency | <200ms | ___ms | [ ] |
| Error Rate | <0.1% | __% | [ ] |
| 5x+ Improvement | YES | [ ] | [ ] |
Gate 3 Result:
- ✅ PASS - Performance targets met
-
⚠️ CONDITIONAL - Performance marginal (3-5x)- Concern: ___________
- Allow proceeding? [YES] [NO]
- ❌ FAIL - Below 3x improvement
- Issue: ___________
- Fix ETA: ___________
Gate 3 Complete Time: ___________ ET
Reference: PHASE-3-CHECK-VALIDATION-GUIDE.md → Gate 4
Consistency Test:
python scripts/validate-consistency.pyValid Scenarios:
- Create entity → Read from cache ✅
- Update entity → Cache invalidated ✅
- Delete entity → Cache cleaned ✅
- Concurrent operations → Safe ✅
- TTL expiration → Working ✅
Gate 4 Result:
- ✅ PASS - Data consistency confirmed
- ❌ FAIL - Consistency issue detected
- Issue: ___________
- Root cause: ___________
- Fix ETA: ___________
Gate 4 Complete Time: ___________ ET
All Gates Status:
Gate 1: ✅ [ ] or ❌ [ ]
Gate 2: ✅ [ ] or ❌ [ ]
Gate 3: ✅ [ ] or ⚠️ [ ] or ❌ [ ]
Gate 4: ✅ [ ] or ❌ [ ]
Final Decision:
- ✅ GO TO ACT PHASE - All gates passed
- ❌ HALT - Fix gates and retest
Decision Made By: ___________ Decision Time: ___________ ET
Next Phase:
IF GO: → ACT PHASE BEGINS (06:00 PM ET)
IF HALT: → SESSION 36B (Investigation & Remediation)
Reference: PHASE-3-ACT-PRODUCTION-ROLLOUT-GUIDE.md → Task 1
Checkpoint 1.1: Dashboard Creation
- Application Insights dashboard created
- Cache hit rate panel added
- Latency panels (P50/P95/P99) added
- RU consumption graphs added
Checkpoint 1.2: Alert Configuration
- Alert 1: Error rate > 0.1%
- Alert 2: P95 latency > 150ms
- Alert 3: Cache hit rate < 40%
- Alert 4: Redis connection lost
- Alert 5: Cosmos RU throttling detected
All 5 Alerts Status: ___________
Checkpoint 1.3: KQL Queries
- Query 1: Hit rate over time ✅
- Query 2: Latency percentiles ✅
- Query 3: RU consumption comparison ✅
- Query 4: Error rate by endpoint ✅
- Query 5: Cache operations analysis ✅
Task 1 Complete Time: ___________ ET
Reference: PHASE-3-ACT-PRODUCTION-ROLLOUT-GUIDE.md → Task 2
Start Time: ___________ ET
Pre-Deployment:
- All monitoring dashboards open
- On-call engineer ready
- Slack notifications enabled
- Team on standby
Deployment:
az containerapp env update `
-n EVA-Sandbox-dev `
--set-env-vars ROLLOUT_PERCENTAGE=10Checkpoint 1.1: Deployment Successful
- Container App updated
- Pods started
- Health endpoint responding
- No errors in logs
Checkpoint 1.2: Monitoring (15 minutes)
- Request count: ___________
- Error rate: ___________% (target: <1%)
- P50 latency: ___________ms
- Cache hit rate: ___________% (baseline)
Stage 1 Decision:
- ✅ GO TO 25% - All metrics good
-
⚠️ HOLD AT 10% - Investigating: ___________ - ❌ ROLLBACK - Issue detected: ___________
Stage 1 Complete Time: ___________ ET
Start Time: ___________ ET
Deployment:
az containerapp env update `
-n EVA-Sandbox-dev `
--set-env-vars ROLLOUT_PERCENTAGE=25Checkpoint 2.1: Deployment Successful
- Pods restarted
- Health check passed
- Monitoring updated
Checkpoint 2.2: Monitoring (30 minutes)
- Request count: ___________
- Error rate: ___________% (target: <0.1%)
- P50 latency: ___________ms
- P95 latency: ___________ms
- Cache hit rate: ___________% (target: >40%)
- RU consumption: ___________/sec (target: <250)
Stage 2 Decision:
- ✅ GO TO 50% - All metrics good
-
⚠️ HOLD AT 25% - Investigating: ___________ - ❌ ROLLBACK - Issue detected: ___________
Stage 2 Complete Time: ___________ ET
Start Time: ___________ ET
Skip Condition: If 25% metrics excellent, can fast-track to 100% Proceed Condition: If any hesitation, validate at 50% first
Decision: [ ] Skip to 100% [ ] [ ] Proceed to 50% [ ]
If proceeding:
az containerapp env update `
-n EVA-Sandbox-dev `
--set-env-vars ROLLOUT_PERCENTAGE=50Checkpoint 3.1: Monitoring (15 minutes)
- All metrics stable
- No new errors detected
- Performance maintained
Stage 3 Decision:
- ✅ GO TO 100%
- ❌ ROLLBACK
Stage 3 Complete Time: ___________ ET
Start Time: ___________ ET
Deployment:
az containerapp env update `
-n EVA-Sandbox-dev `
--set-env-vars ROLLOUT_PERCENTAGE=100Checkpoint 4.1: Full Deployment
- 100% traffic on cache layer
- All endpoints responding
- Monitoring dashboards showing unified data
- Zero errors observed
Checkpoint 4.2: Immediate Validation
- Chat with team: Everything good?
- Monitoring looks normal?
- Error alerts triggered? (should be NONE)
Stage 4 Status: ✅ PRODUCTION LIVE
Stage 4 Complete Time: ___________ ET
Start Time: ___________ ET End Time: ___________ ET (Next morning)
Monitoring Checklist (Check hourly):
Hour 1 [ ] - Error rate: ___%, Hit rate: ___%
Hour 2 [ ] - Error rate: ___%, Hit rate: ___%
Hour 3 [ ] - Error rate: ___%, Hit rate: ___%
Hour 4 [ ] - Error rate: ___%, Hit rate: ___%
Hour 5 [ ] - Error rate: ___%, Hit rate: ___%
Hour 6 [ ] - Error rate: ___%, Hit rate: ___%
Hour 7 [ ] - Error rate: ___%, Hit rate: ___%
Hour 8 [ ] - Error rate: ___%, Hit rate: ___%
Hour 9 [ ] - Error rate: ___%, Hit rate: ___%
Hour 10 [ ] - Error rate: ___%, Hit rate: ___%
Hour 11 [ ] - Error rate: ___%, Hit rate: ___%
Hour 12 [ ] - Error rate: ___%, Hit rate: ___%
Hour 13 [ ] - Error rate: ___%, Hit rate: ___%
Hour 14 [ ] - Error rate: ___%, Hit rate: ___%
Hour 15 [ ] - Error rate: ___%, Hit rate: ___%
Hour 16 [ ] - Error rate: ___%, Hit rate: ___%
Hour 17 [ ] - Error rate: ___%, Hit rate: ___%
Hour 18 [ ] - Error rate: ___%, Hit rate: ___%
Hour 19 [ ] - Error rate: ___%, Hit rate: ___%
Hour 20 [ ] - Error rate: ___%, Hit rate: ___%
Hour 21 [ ] - Error rate: ___%, Hit rate: ___%
Hour 22 [ ] - Error rate: ___%, Hit rate: ___%
Hour 23 [ ] - Error rate: ___%, Hit rate: ___%
Hour 24 [ ] - Error rate: ___%, Hit rate: ___%
Critical Alerts Observed:
- None (EXCELLENT ✅)
- [Describe]: ___________
24-Hour Wrap-Up Report:
- Generated
- Reviewed
- All metrics within targets
Task 3 Complete Time: ___________ ET (Next morning)
Start Time: ___________ ET
End Time: ___________ ET
Duration: ___________ hours
Status: ✅ COMPLETE [ ] ❌ INCOMPLETE [ ]
Tasks Completed: [ ] 5/5
All deliverables: ✅ YES [ ] ❌ NO [ ]
Start Time: ___________ ET
End Time: ___________ ET
Duration: ___________ hours
Status: ✅ COMPLETE [ ] ❌ INCOMPLETE [ ]
Gates Passed: [ ] 4/4 [ ] 3/4 [ ] <3/4
Decision: ✅ GO [ ] ❌ NO-GO [ ]
Start Time: ___________ ET
End Time: ___________ ET (+ 24h monitoring)
Duration: ___________ hours
Status: ✅ COMPLETE [ ] ❌ INCOMPLETE [ ]
Stages Deployed: [ ] 4/4 [ ] 3/4 [ ] 2/4
Production Status: ✅ LIVE [ ] ⚠️ PARTIAL [ ] ❌ ROLLED BACK [ ]
Monitoring: ✅ 24h STABLE [ ] ⚠️ ISSUES [ ]
╔════════════════════════════════════════════════════════════════╗
║ ║
║ SESSION 36 EXECUTION COMPLETE: ✅ YES [ ] ❌ NO [ ] ║
║ ║
║ Project Status: 85% → 100% COMPLETE [ ] ║
║ ║
║ DO Phase: ✅ [ ] CHECK Phase: ✅ [ ] ACT Phase: ✅ [ ]
║ ║
║ Next Steps: Deploy to next region / Scale / Optimize ║
║ ║
╚════════════════════════════════════════════════════════════════╝
Checklist Status: LIVE & IN USE Last Updated: ___________ Owner: Session 36 Team Next Review: Every 30 minutes during execution