Enable Post-Processing and Export of Benchmark Results to External Data Warehouses

## Summary
Add post-processing capabilities to Zathras that convert benchmark results into structured JSON documents and export them to external data warehouses (OpenSearch, Horreum) for long-term storage, querying, and analysis.

---

## Problem Statement / Current State

Currently, Zathras:
- ✅ Successfully orchestrates benchmark execution across cloud and bare metal systems
- ✅ Collects extensive metadata (hardware config, system config, cloud metadata)
- ✅ Retrieves test results as compressed tarballs (`results_<test>.zip`)
- ✅ Stores results in directory structures: `results_prefix/os_vendor/cloud_type/instance_type_N/`

**However:**
- ❌ Results are stored as **unstructured tarballs** on the controller filesystem
- ❌ No centralized database for historical result queries
- ❌ Difficult to perform trend analysis across multiple test runs
- ❌ No automated regression detection capabilities
- ❌ Limited visibility into performance trends over time
- ❌ Results are isolated per test run with no cross-run correlation
- ❌ Executive reporting requires manual data extraction

**Impact:**
- Performance engineers must manually extract and analyze results
- Historical comparisons require custom scripts
- No dashboards or visualization of trends
- Difficult to answer questions like:
  - "How has STREAM performance on m5.xlarge changed over the last 6 months?"
  - "Which tuned profile performs best for linpack across instance types?"
  - "Are we seeing performance regressions after OS updates?"

---

## Proposed Solution

Implement a **post-processing and export pipeline** that:

1. **Extracts** results from Zathras archive directories
2. **Transforms** test outputs into structured JSON documents
3. **Enriches** with metadata (hardware, system config, cloud details)
4. **Validates** data integrity and schema compliance
5. **Exports** to configurable data warehouse targets

### Architecture

```
post_processing/
├── main.py                          # Orchestrator script
├── processors/
│   ├── base_processor.py           # Abstract base class
│   ├── fio_processor.py            # FIO-specific parser
│   ├── streams_processor.py        # STREAM-specific parser
│   └── ...                          # One per test type
├── exporters/
│   ├── opensearch_exporter.py      # OpenSearch integration
│   └── horreum_exporter.py         # Horreum integration
└── utils/
    ├── metadata_extractor.py       # Parse Zathras metadata
    └── archive_handler.py          # Handle zip/tar extraction
```

### Unified JSON Schema

```json
{
  "test_run": {
    "id": "uuid",
    "timestamp": "ISO-8601",
    "zathras_version": "3.2"
  },
  "infrastructure": {
    "type": "aws|azure|gcp|local",
    "instance_type": "m5.xlarge",
    "region": "us-east-1",
    "os": { "vendor": "rhel", "version": "9.3" }
  },
  "hardware": {
    "cpu": { "model": "...", "cores": 16, ... },
    "memory": { "total_gb": 32, ... }
  },
  "test": {
    "name": "streams",
    "version": "v1.0",
    "status": "passed|failed",
    "duration_seconds": 235
  },
  "results": {
    "metrics": { /* test-specific */ },
    "raw_output": "..."
  }
}
```

---

## Justification

### Why This Approach?

**Option 1 Considered: Modify all test wrappers to output JSON**
- ❌ Requires coordinating changes across 18+ separate repositories
- ❌ Each wrapper maintained by different teams
- ❌ No standardization guarantee
- ❌ Months of coordination overhead
- ❌ Breaks backward compatibility

**Option 2 (Proposed): Post-processing at Zathras level**
- ✅ **Single point of implementation** - change only Zathras
- ✅ **Wrapper independence** - no changes to external test repos
- ✅ **Standardized schema** - consistent structure across all tests
- ✅ **Metadata enrichment** - combine test results with infrastructure context
- ✅ **Faster deployment** - weeks instead of months
- ✅ **Backward compatible** - doesn't break existing workflows
- ✅ **Historical data support** - can reprocess old results

### Why Not Store in Zathras Database?

- Performance testing requires **time-series analysis** capabilities
- Need **sophisticated querying** across multiple dimensions
- Require **visualization/dashboard** integration
- Benefit from **existing data warehouse infrastructure**
- Scale to millions of data points over time

---

## Benefits

### For Performance Engineers
- 📊 **Real-time dashboards** showing current and historical trends
- 🔍 **Ad-hoc queries** across any dimension (test, OS, instance type, date range)
- 📈 **Trend visualization** to spot performance changes over time
- 🚨 **Regression detection** via automated alerts
- 📝 **Automated reporting** for stakeholders

### For Engineering Managers
- 📉 **Executive dashboards** with performance KPIs
- 💰 **Cost analysis** (test runtime × cloud pricing = spend tracking)
- 🎯 **Goal tracking** (performance targets vs actual results)
- 📊 **Team productivity metrics** (tests run, pass rates)

### For CI/CD Integration
- 🔄 **Automated performance gates** (fail CI if regression detected)
- 📧 **Notifications** on performance changes
- 🔗 **Integration** with existing monitoring systems
- 📦 **Data portability** (JSON export to other tools)

### For Research & Analysis
- 🔬 **Cross-test correlation** (does better STREAM predict better HammerDB?)
- 🌡️ **Environmental impact** (how do tuned profiles affect results?)
- 📐 **Statistical analysis** (standard deviation, percentiles)
- 🗂️ **Dataset creation** for ML models

---

## Data Warehouse Options

### Option A: OpenSearch (Elasticsearch fork)

**Best for:** General-purpose search, analytics, and visualization

**Pros:**
- ✅ Powerful query language (SQL + DSL)
- ✅ Kibana dashboards for visualization
- ✅ Time-series analysis built-in
- ✅ Widely adopted, large community
- ✅ Real-time indexing and search
- ✅ Flexible schema (add fields without migration)
- ✅ REST API for easy integration

**Cons:**
- ⚠️ Not performance-testing-specific
- ⚠️ Requires infrastructure setup
- ⚠️ May need custom dashboards

**Use Cases:**
- Real-time monitoring during test runs
- Ad-hoc queries across result history
- Executive dashboards
- Log correlation with performance data

---

### Option B: Horreum (Performance Test Results Repository)

**Best for:** Performance regression tracking and historical comparison

**Pros:**
- ✅ **Built specifically for performance testing**
- ✅ Automatic regression detection
- ✅ Change point analysis (detects when performance shifts)
- ✅ Comparison views (before/after, baseline/candidate)
- ✅ Schema validation for test results
- ✅ Native understanding of performance metrics
- ✅ Integration with CI/CD pipelines
- ✅ Run comparison and annotation

**Cons:**
- ⚠️ More specialized, smaller community
- ⚠️ Less flexible for non-performance queries
- ⚠️ Steeper learning curve

**Use Cases:**
- Performance regression tracking in CI/CD
- Baseline management (golden results)
- Automated alerting on regressions
- Historical performance comparison

---

### Option C: Both (Recommended)

**Strategy:** Support multiple exporters, let users configure based on needs

```yaml
# In scenario file
global:
  results_export:
    enabled: true
    targets:
      - opensearch:
          url: "https://opensearch.example.com"
          index: "zathras-results"
      - horreum:
          url: "https://horreum.example.com"
          test: "zathras-benchmark-suite"
```

**Benefits:**
- OpenSearch for real-time monitoring and dashboards
- Horreum for regression detection and CI/CD gates
- Users choose based on infrastructure and needs
- Not mutually exclusive

---

### Option D: Other Data Warehouses

The architecture supports adding exporters for:
- **InfluxDB** (time-series focused)
- **PostgreSQL/TimescaleDB** (SQL queries)
- **Prometheus** (metrics and alerting)
- **Custom REST APIs** (organization-specific systems)

---

## Implementation Approach

### Phase 1: Foundation
- [X] Create `post_processing/` directory structure
- [x] Implement base processor interface
- [ ] Build metadata extractor (parse hw_config.yml, ansible_vars.yml)
- [ ] Implement archive handler (zip/tar extraction)
- [ ] Create one processor (FIO - already outputs JSON)
- [ ] Build OpenSearch exporter
- [ ] Create `main.py` orchestrator
- [ ] Test end-to-end with sample FIO results

**Deliverable:** Working prototype that processes FIO results to OpenSearch

---

### Phase 2: Expansion
- [ ] Add 3-5 more processors (streams, linpack, coremark, uperf)
- [ ] Implement Horreum exporter
- [ ] Add configuration file support
- [ ] Error handling and retry logic
- [ ] Logging and debugging capabilities
- [ ] CLI improvements (dry-run, batch mode)

**Deliverable:** Multi-test support with dual exporter capability

---

### Phase 3: Production Ready 
- [ ] Complete remaining processors
- [ ] Integration tests with real scenarios
- [ ] Documentation (README, usage examples)
- [ ] Schema documentation
- [ ] Performance optimization
- [ ] Optional integration into burden script

**Deliverable:** Production-ready, documented solution

---

## Key Design Decisions

### 1. Standalone vs Integrated
**Decision:** Build as **standalone tool first**, integrate later

**Rationale:**
- Allows independent development and testing
- Can process historical data
- Users can run manually or in cron jobs
- Optional integration into burden when stable

### 2. Push vs Pull
**Decision:** **Push model** (Zathras pushes to data warehouse)

**Rationale:**
- Simpler architecture
- Real-time availability
- No data warehouse needs Zathras access
- Standard pattern for observability

### 3. Synchronous vs Asynchronous
**Decision:** **Synchronous initially**, async later if needed

**Rationale:**
- Simpler implementation
- Export time is small relative to test runtime
- Can optimize later if bottleneck

### 4. Schema Strict vs Flexible
**Decision:** **Flexible schema with core required fields**

**Rationale:**
- Tests evolve over time
- New metadata may be added
- Graceful degradation if parsing fails
- OpenSearch handles schema evolution well

---

## Success Criteria

### Must Have
- [ ] Process at least 3 test types (FIO, STREAM, Linpack)
- [ ] Export to OpenSearch successfully
- [ ] Unified JSON schema documented
- [ ] Metadata enrichment working (hardware + cloud + test config)
- [ ] Error handling (graceful failures)
- [ ] Can process historical results

### Should Have
- [ ] Export to Horreum
- [ ] 5-7 test processors implemented
- [ ] Configuration file support
- [ ] Batch processing mode
- [ ] Unit test coverage >70%

### Nice to Have
- [ ] All 18 test processors
- [ ] Integration into burden script
- [ ] Sample Kibana dashboards
- [ ] Performance optimization
- [ ] Resumability (skip already-exported)

---

## Non-Goals (Out of Scope)

- ❌ Modifying test wrappers
- ❌ Real-time streaming during test execution
- ❌ Building custom visualization UI
- ❌ Changing Zathras core functionality
- ❌ Result validation/correctness checking
- ❌ Test execution scheduling

---

## Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| Test output format changes break parsers | Medium | Keep raw output; version detection; graceful fallback |
| Data warehouse unavailable during export | Low | Retry logic; queue exports; offline mode |
| Schema evolution over time | Medium | Flexible schema; version field; backward compatibility |
| Performance overhead | Low | Async export; make optional; optimize critical paths |
| Adoption resistance | Medium | Make optional; show value first; gather feedback |

---

## Open Questions

1. **Authentication:** How should credentials be managed? (env vars, config file, secrets manager?)
2. **Data retention:** Who manages data warehouse retention policies?
3. **Schema governance:** Who approves schema changes?
4. **Priority tests:** Which 3-5 tests should we implement first?
5. **Infrastructure:** Is OpenSearch/Horreum already deployed or needs setup?
6. **Access control:** Who can export results? Any restrictions?

---

## References

- Zathras repository: https://github.com/redhat-performance/zathras
- OpenSearch documentation: https://opensearch.org/docs/latest/
- Horreum documentation: https://horreum.hyperfoil.io/
- Test wrappers: https://github.com/redhat-performance/ (various repos)

---

## Labels
`enhancement`, `feature`, `observability`, `data-export`

Risk	Impact	Mitigation
Test output format changes break parsers	Medium	Keep raw output; version detection; graceful fallback
Data warehouse unavailable during export	Low	Retry logic; queue exports; offline mode
Schema evolution over time	Medium	Flexible schema; version field; backward compatibility
Performance overhead	Low	Async export; make optional; optimize critical paths
Adoption resistance	Medium	Make optional; show value first; gather feedback

Enable Post-Processing and Export of Benchmark Results to External Data Warehouses #116

Description

Summary

Problem Statement / Current State

Proposed Solution

Architecture

Unified JSON Schema

Justification

Why This Approach?

Why Not Store in Zathras Database?

Benefits

For Performance Engineers

For Engineering Managers

For CI/CD Integration

For Research & Analysis

Data Warehouse Options

Option A: OpenSearch (Elasticsearch fork)

Option B: Horreum (Performance Test Results Repository)

Option C: Both (Recommended)

Option D: Other Data Warehouses

Implementation Approach

Phase 1: Foundation

Phase 2: Expansion

Phase 3: Production Ready

Key Design Decisions

1. Standalone vs Integrated

2. Push vs Pull

3. Synchronous vs Asynchronous

4. Schema Strict vs Flexible

Success Criteria

Must Have

Should Have

Nice to Have

Non-Goals (Out of Scope)

Risks & Mitigations

Open Questions

References

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions