-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Add post-processing capabilities to Zathras that convert benchmark results into structured JSON documents and export them to external data warehouses (OpenSearch, Horreum) for long-term storage, querying, and analysis.
Problem Statement / Current State
Currently, Zathras:
- ✅ Successfully orchestrates benchmark execution across cloud and bare metal systems
- ✅ Collects extensive metadata (hardware config, system config, cloud metadata)
- ✅ Retrieves test results as compressed tarballs (
results_<test>.zip) - ✅ Stores results in directory structures:
results_prefix/os_vendor/cloud_type/instance_type_N/
However:
- ❌ Results are stored as unstructured tarballs on the controller filesystem
- ❌ No centralized database for historical result queries
- ❌ Difficult to perform trend analysis across multiple test runs
- ❌ No automated regression detection capabilities
- ❌ Limited visibility into performance trends over time
- ❌ Results are isolated per test run with no cross-run correlation
- ❌ Executive reporting requires manual data extraction
Impact:
- Performance engineers must manually extract and analyze results
- Historical comparisons require custom scripts
- No dashboards or visualization of trends
- Difficult to answer questions like:
- "How has STREAM performance on m5.xlarge changed over the last 6 months?"
- "Which tuned profile performs best for linpack across instance types?"
- "Are we seeing performance regressions after OS updates?"
Proposed Solution
Implement a post-processing and export pipeline that:
- Extracts results from Zathras archive directories
- Transforms test outputs into structured JSON documents
- Enriches with metadata (hardware, system config, cloud details)
- Validates data integrity and schema compliance
- Exports to configurable data warehouse targets
Architecture
post_processing/
├── main.py # Orchestrator script
├── processors/
│ ├── base_processor.py # Abstract base class
│ ├── fio_processor.py # FIO-specific parser
│ ├── streams_processor.py # STREAM-specific parser
│ └── ... # One per test type
├── exporters/
│ ├── opensearch_exporter.py # OpenSearch integration
│ └── horreum_exporter.py # Horreum integration
└── utils/
├── metadata_extractor.py # Parse Zathras metadata
└── archive_handler.py # Handle zip/tar extraction
Unified JSON Schema
{
"test_run": {
"id": "uuid",
"timestamp": "ISO-8601",
"zathras_version": "3.2"
},
"infrastructure": {
"type": "aws|azure|gcp|local",
"instance_type": "m5.xlarge",
"region": "us-east-1",
"os": { "vendor": "rhel", "version": "9.3" }
},
"hardware": {
"cpu": { "model": "...", "cores": 16, ... },
"memory": { "total_gb": 32, ... }
},
"test": {
"name": "streams",
"version": "v1.0",
"status": "passed|failed",
"duration_seconds": 235
},
"results": {
"metrics": { /* test-specific */ },
"raw_output": "..."
}
}Justification
Why This Approach?
Option 1 Considered: Modify all test wrappers to output JSON
- ❌ Requires coordinating changes across 18+ separate repositories
- ❌ Each wrapper maintained by different teams
- ❌ No standardization guarantee
- ❌ Months of coordination overhead
- ❌ Breaks backward compatibility
Option 2 (Proposed): Post-processing at Zathras level
- ✅ Single point of implementation - change only Zathras
- ✅ Wrapper independence - no changes to external test repos
- ✅ Standardized schema - consistent structure across all tests
- ✅ Metadata enrichment - combine test results with infrastructure context
- ✅ Faster deployment - weeks instead of months
- ✅ Backward compatible - doesn't break existing workflows
- ✅ Historical data support - can reprocess old results
Why Not Store in Zathras Database?
- Performance testing requires time-series analysis capabilities
- Need sophisticated querying across multiple dimensions
- Require visualization/dashboard integration
- Benefit from existing data warehouse infrastructure
- Scale to millions of data points over time
Benefits
For Performance Engineers
- 📊 Real-time dashboards showing current and historical trends
- 🔍 Ad-hoc queries across any dimension (test, OS, instance type, date range)
- 📈 Trend visualization to spot performance changes over time
- 🚨 Regression detection via automated alerts
- 📝 Automated reporting for stakeholders
For Engineering Managers
- 📉 Executive dashboards with performance KPIs
- 💰 Cost analysis (test runtime × cloud pricing = spend tracking)
- 🎯 Goal tracking (performance targets vs actual results)
- 📊 Team productivity metrics (tests run, pass rates)
For CI/CD Integration
- 🔄 Automated performance gates (fail CI if regression detected)
- 📧 Notifications on performance changes
- 🔗 Integration with existing monitoring systems
- 📦 Data portability (JSON export to other tools)
For Research & Analysis
- 🔬 Cross-test correlation (does better STREAM predict better HammerDB?)
- 🌡️ Environmental impact (how do tuned profiles affect results?)
- 📐 Statistical analysis (standard deviation, percentiles)
- 🗂️ Dataset creation for ML models
Data Warehouse Options
Option A: OpenSearch (Elasticsearch fork)
Best for: General-purpose search, analytics, and visualization
Pros:
- ✅ Powerful query language (SQL + DSL)
- ✅ Kibana dashboards for visualization
- ✅ Time-series analysis built-in
- ✅ Widely adopted, large community
- ✅ Real-time indexing and search
- ✅ Flexible schema (add fields without migration)
- ✅ REST API for easy integration
Cons:
⚠️ Not performance-testing-specific⚠️ Requires infrastructure setup⚠️ May need custom dashboards
Use Cases:
- Real-time monitoring during test runs
- Ad-hoc queries across result history
- Executive dashboards
- Log correlation with performance data
Option B: Horreum (Performance Test Results Repository)
Best for: Performance regression tracking and historical comparison
Pros:
- ✅ Built specifically for performance testing
- ✅ Automatic regression detection
- ✅ Change point analysis (detects when performance shifts)
- ✅ Comparison views (before/after, baseline/candidate)
- ✅ Schema validation for test results
- ✅ Native understanding of performance metrics
- ✅ Integration with CI/CD pipelines
- ✅ Run comparison and annotation
Cons:
⚠️ More specialized, smaller community⚠️ Less flexible for non-performance queries⚠️ Steeper learning curve
Use Cases:
- Performance regression tracking in CI/CD
- Baseline management (golden results)
- Automated alerting on regressions
- Historical performance comparison
Option C: Both (Recommended)
Strategy: Support multiple exporters, let users configure based on needs
# In scenario file
global:
results_export:
enabled: true
targets:
- opensearch:
url: "https://opensearch.example.com"
index: "zathras-results"
- horreum:
url: "https://horreum.example.com"
test: "zathras-benchmark-suite"Benefits:
- OpenSearch for real-time monitoring and dashboards
- Horreum for regression detection and CI/CD gates
- Users choose based on infrastructure and needs
- Not mutually exclusive
Option D: Other Data Warehouses
The architecture supports adding exporters for:
- InfluxDB (time-series focused)
- PostgreSQL/TimescaleDB (SQL queries)
- Prometheus (metrics and alerting)
- Custom REST APIs (organization-specific systems)
Implementation Approach
Phase 1: Foundation
- Create
post_processing/directory structure - Implement base processor interface
- Build metadata extractor (parse hw_config.yml, ansible_vars.yml)
- Implement archive handler (zip/tar extraction)
- Create one processor (FIO - already outputs JSON)
- Build OpenSearch exporter
- Create
main.pyorchestrator - Test end-to-end with sample FIO results
Deliverable: Working prototype that processes FIO results to OpenSearch
Phase 2: Expansion
- Add 3-5 more processors (streams, linpack, coremark, uperf)
- Implement Horreum exporter
- Add configuration file support
- Error handling and retry logic
- Logging and debugging capabilities
- CLI improvements (dry-run, batch mode)
Deliverable: Multi-test support with dual exporter capability
Phase 3: Production Ready
- Complete remaining processors
- Integration tests with real scenarios
- Documentation (README, usage examples)
- Schema documentation
- Performance optimization
- Optional integration into burden script
Deliverable: Production-ready, documented solution
Key Design Decisions
1. Standalone vs Integrated
Decision: Build as standalone tool first, integrate later
Rationale:
- Allows independent development and testing
- Can process historical data
- Users can run manually or in cron jobs
- Optional integration into burden when stable
2. Push vs Pull
Decision: Push model (Zathras pushes to data warehouse)
Rationale:
- Simpler architecture
- Real-time availability
- No data warehouse needs Zathras access
- Standard pattern for observability
3. Synchronous vs Asynchronous
Decision: Synchronous initially, async later if needed
Rationale:
- Simpler implementation
- Export time is small relative to test runtime
- Can optimize later if bottleneck
4. Schema Strict vs Flexible
Decision: Flexible schema with core required fields
Rationale:
- Tests evolve over time
- New metadata may be added
- Graceful degradation if parsing fails
- OpenSearch handles schema evolution well
Success Criteria
Must Have
- Process at least 3 test types (FIO, STREAM, Linpack)
- Export to OpenSearch successfully
- Unified JSON schema documented
- Metadata enrichment working (hardware + cloud + test config)
- Error handling (graceful failures)
- Can process historical results
Should Have
- Export to Horreum
- 5-7 test processors implemented
- Configuration file support
- Batch processing mode
- Unit test coverage >70%
Nice to Have
- All 18 test processors
- Integration into burden script
- Sample Kibana dashboards
- Performance optimization
- Resumability (skip already-exported)
Non-Goals (Out of Scope)
- ❌ Modifying test wrappers
- ❌ Real-time streaming during test execution
- ❌ Building custom visualization UI
- ❌ Changing Zathras core functionality
- ❌ Result validation/correctness checking
- ❌ Test execution scheduling
Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Test output format changes break parsers | Medium | Keep raw output; version detection; graceful fallback |
| Data warehouse unavailable during export | Low | Retry logic; queue exports; offline mode |
| Schema evolution over time | Medium | Flexible schema; version field; backward compatibility |
| Performance overhead | Low | Async export; make optional; optimize critical paths |
| Adoption resistance | Medium | Make optional; show value first; gather feedback |
Open Questions
- Authentication: How should credentials be managed? (env vars, config file, secrets manager?)
- Data retention: Who manages data warehouse retention policies?
- Schema governance: Who approves schema changes?
- Priority tests: Which 3-5 tests should we implement first?
- Infrastructure: Is OpenSearch/Horreum already deployed or needs setup?
- Access control: Who can export results? Any restrictions?
References
- Zathras repository: https://github.com/redhat-performance/zathras
- OpenSearch documentation: https://opensearch.org/docs/latest/
- Horreum documentation: https://horreum.hyperfoil.io/
- Test wrappers: https://github.com/redhat-performance/ (various repos)
Labels
enhancement, feature, observability, data-export