- Project: benchkit
- Version Target: 1.0.0
- Date: 2025-08-08
- Status: ACTIVE
Benchkit is a toolkit, not a framework for practical benchmarking with markdown-first reporting. It provides flexible building blocks that developers can combine to create custom benchmarking solutions tailored to their specific needs.
- Toolkit over Framework: Provide composable functions rather than monolithic workflows
- Markdown-First Reporting: Treat markdown as first-class output format
- Zero-Copy Where Possible: Minimize allocations during measurement
- Statistical Rigor: Provide proper statistical analysis with confidence intervals
Timeline: Week 1-2
Justification: Essential for any benchmarking work
-
Basic Timing & Measurement (
enabledfeature)- Simple timing functions for arbitrary code blocks
- Nested timing for hierarchical analysis
- Statistical measures (mean, median, min, max, percentiles)
- Custom metrics support beyond timing
-
Markdown Report Generation (
markdown_reportsfeature)- Generate markdown tables and sections for benchmark results
- Update specific sections of existing markdown files
- Preserve non-benchmark content when updating documents
-
Standard Data Generators (
data_generatorsfeature)- Lists of varying sizes (small: 10, medium: 100, large: 1000, huge: 10000)
- Maps with configurable key-value distributions
- Strings with controlled length and character sets
- Consistent seeding for reproducible benchmarks
- New users can run first benchmark in <5 minutes
- Integration requires <10 lines of code
- Measurement overhead <1% for operations >1ms
- All core features work independently
-
Project Structure
- Cargo.toml with proper feature flags
- lib.rs with mod_interface pattern
- Core modules: timing, generators, reports
-
Core APIs
BenchmarkSuitefor organizing benchmarksbench_blockfor timing arbitrary codeMetricCollectorfor extensible metricsgenerate_list_data,generate_map_datagenerators
-
Testing Infrastructure
- Comprehensive test suite in
tests/directory - Test matrix covering all core functionality
- Integration tests with real markdown files
- Comprehensive test suite in
Timeline: Week 3-4
Justification: Needed for optimization decision-making
-
Comparative Analysis (
comparative_analysisfeature)- Before/after performance comparisons
- A/B testing capabilities for algorithm variants
- Comparative reports highlighting differences
-
Statistical Analysis (
statistical_analysisfeature)- Standard statistical measures for benchmark results
- Outlier detection and confidence intervals
- Multiple sampling strategies
-
Baseline Management
- Save and compare against performance baselines
- Automatic regression detection
- Percentage improvement/degradation calculations
- Performance regressions detected within 1% accuracy
- Statistical confidence intervals provided
- Comparative reports show clear optimization guidance
Timeline: Week 5-6
Justification: Nice-to-have for comprehensive analysis
-
HTML Reports (
html_reportsfeature)- HTML report generation with customizable templates
- Chart and visualization embedding
- Interactive performance dashboards
-
JSON Reports (
json_reportsfeature)- Machine-readable JSON output format
- API integration support
- Custom data processing pipelines
-
Criterion Compatibility (
criterion_compatfeature)- Compatibility layer with existing criterion benchmarks
- Migration tools from criterion to benchkit
- Hybrid usage patterns
-
Optimization Hints (
optimization_hintsfeature)- Analyze results to suggest optimization opportunities
- Identify performance scaling characteristics
- Actionable recommendations based on measurement patterns
- Compatible with existing criterion benchmarks
- Multiple output formats work seamlessly
- Optimization hints provide actionable guidance
Timeline: Week 7-8
Justification: Long-term adoption and CI/CD integration
-
CI/CD Tooling
- Automated performance monitoring in CI pipelines
- Performance regression alerts
- Integration with GitHub Actions, GitLab CI
-
IDE Integration
- Editor extensions for VS Code, IntelliJ
- Inline performance annotations
- Real-time benchmark execution
-
Monitoring & Alerting
- Long-term performance trend tracking
- Performance degradation notifications
- Historical performance analysis
| Feature | Description | Default | Dependencies |
|---|---|---|---|
enabled |
Core benchmarking functionality | ✓ | - |
markdown_reports |
Markdown report generation | ✓ | pulldown-cmark |
data_generators |
Common data generation patterns | ✓ | rand |
criterion_compat |
Compatibility layer with criterion | ✓ | criterion |
html_reports |
HTML report generation | - | tera |
json_reports |
JSON report output | - | serde_json |
statistical_analysis |
Advanced statistical analysis | - | statistical |
comparative_analysis |
A/B testing and comparisons | - | - |
optimization_hints |
Performance optimization suggestions | - | statistical_analysis |
-
Performance
- Measurement overhead <1% for operations >1ms
- Data generation must not significantly impact timing
- Report generation <10 seconds for typical suites
-
Usability
- Integration requires <10 lines of code
- Sensible defaults for common scenarios
- Incremental adoption alongside existing tools
-
Reliability
- Consistent results across runs (±5% variance)
- Deterministic seeding for reproducible data
- Statistical confidence measures for system noise
-
Compatibility
- Primary: std environments
- Secondary: no_std compatibility for core timing
- Platforms: Linux, macOS, Windows
-
Test-Driven Development
- Write tests before implementation
- Test matrix for comprehensive coverage
- Integration tests with real use cases
-
Incremental Implementation
- Complete one feature before starting next
- Each feature must work independently
- Regular verification against success criteria
-
Documentation-Driven
- Update documentation with each feature
- Real examples in all documentation
- Performance characteristics documented
benchkit/
├── Cargo.toml # Feature flags and dependencies
├── src/
│ ├── lib.rs # Public API and mod_interface
│ ├── timing/ # Core timing and measurement
│ ├── generators/ # Data generation utilities
│ ├── reports/ # Output format generation
│ └── analysis/ # Statistical and comparative analysis
├── tests/ # All tests (no tests in src/)
│ ├── timing_tests.rs
│ ├── generators_tests.rs
│ ├── reports_tests.rs
│ └── integration_tests.rs
├── benchmarks/ # Internal performance benchmarks
└── examples/ # Usage demonstrations
use benchkit::prelude::*;
fn benchmark_my_function()
{
let mut suite = BenchmarkSuite::new("my_function_performance");
suite.benchmark("small_input", || {
let data = generate_list_data(10);
bench_block(|| my_function(&data))
});
suite.generate_markdown_report("performance.md", "## Performance Results");
}use benchkit::prelude::*;
fn compare_algorithms()
{
let comparison = ComparativeAnalysis::new()
.algorithm("original", || original_algorithm(&data))
.algorithm("optimized", || optimized_algorithm(&data))
.with_data_sizes(&[10, 100, 1000, 10000]);
let report = comparison.run_comparison();
report.update_markdown_section("README.md", "## Algorithm Comparison");
}-
Measurement Accuracy
- Risk: System noise affecting benchmark reliability
- Mitigation: Statistical analysis, multiple sampling, outlier detection
-
Integration Complexity
- Risk: Difficult integration with existing projects
- Mitigation: Simple APIs, comprehensive examples, incremental adoption
-
Performance Overhead
- Risk: Benchmarking tools slowing down measurements
- Mitigation: Zero-copy design, minimal allocations, performance testing
-
Feature Creep
- Risk: Adding too many features, losing focus
- Mitigation: Strict phase-based development, clear success criteria
-
User Adoption
- Risk: Users preferring existing tools (criterion)
- Mitigation: Compatibility layer, clear value proposition, migration tools
- Time to first benchmark: <5 minutes
- Integration effort: <10 lines of code
- Documentation automation: Zero manual copying
- Regression detection accuracy: >99%
- Measurement overhead: <1%
- Feature independence: 100%
- Platform compatibility: Linux, macOS, Windows
- Memory efficiency: O(n) scaling with data size
-
Immediate (This Week)
- Set up project structure with Cargo.toml
- Implement core timing module
- Create basic data generators
- Set up testing infrastructure
-
Short-term (Next 2 Weeks)
- Complete Phase 1 MVP implementation
- Comprehensive test coverage
- Basic markdown report generation
- Documentation and examples
-
Medium-term (Month 2)
- Phase 2 analysis tools
- Statistical rigor improvements
- Comparative analysis features
- Performance optimization
- spec.md - Complete functional requirements and technical specifications
- usage.md - Lessons learned from unilang/strs_tools benchmarking
- Design Rulebook - Architectural principles and development procedures
- Codestyle Rulebook - Code formatting and structural patterns