feat: include hypothesis-test HTML report in book (#759)

Copilot · tschm · web-flow · commit 9bee22be51b6 · 2026-02-21T12:10:30.000+04:00
* Initial plan

* Initial plan for hypothesis-test report in book

Co-authored-by: tschm &lt;2046079+tschm@users.noreply.github.com&gt;

* Add hypothesis-test report to book

Co-authored-by: tschm &lt;2046079+tschm@users.noreply.github.com&gt;

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: tschm &lt;2046079+tschm@users.noreply.github.com&gt;
Co-authored-by: Thomas Schmelzer &lt;thomas.schmelzer@gmail.com&gt;
diff --git a/.rhiza/make.d/book.mk b/.rhiza/make.d/book.mk
@@ -42,6 +42,7 @@ BOOK_SECTIONS := \
   "Test Report|_tests/html-report/report.html|tests/html-report/report.html|_tests/html-report|tests/html-report" \
   "Benchmarks|_tests/benchmarks/report.html|tests/benchmarks/report.html|_tests/benchmarks|tests/benchmarks" \
   "Stress Tests|_tests/stress/report.html|tests/stress/report.html|_tests/stress|tests/stress" \
+  "Hypothesis Tests|_tests/hypothesis/report.html|tests/hypothesis/report.html|_tests/hypothesis|tests/hypothesis" \
   "Notebooks|_marimushka/index.html|marimushka/index.html|_marimushka|marimushka" \
   "Official Documentation|_mkdocs/index.html|docs/index.html|_mkdocs|docs"
 
@@ -51,7 +52,7 @@ BOOK_SECTIONS := \
 # 1. Aggregates API docs, coverage, test reports, notebooks, and MkDocs site into _book.
 # 2. Generates links.json to define the book structure.
 # 3. Uses 'minibook' to compile the final HTML site.
-book:: test benchmark stress docs marimushka mkdocs-build ## compile the companion book
+book:: test benchmark stress hypothesis-test docs marimushka mkdocs-build ## compile the companion book
 	@printf "${BLUE}[INFO] Building combined documentation...${RESET}\n"
 	@rm -rf _book && mkdir -p _book
 
diff --git a/REPOSITORY_ANALYSIS.md b/REPOSITORY_ANALYSIS.md
@@ -841,3 +841,167 @@ def setup_rhiza_git_repo()
    - Document autouse fixtures and their impact
    - Add architecture diagram showing fixture dependencies
    - Clarify stress test acceptance criteria
+
+---
+
+## 2026-02-21 — Comprehensive Analysis Entry
+
+### Summary
+
+The **Rhiza** repository is a sophisticated Python project template system ("living templates") that enables continuous synchronization of configuration, CI/CD workflows, and development tooling across projects. The codebase demonstrates production-grade engineering with well-structured Makefiles, comprehensive testing (property-based, stress, benchmark, integration), dual CI/CD support (GitHub Actions + GitLab CI), and a modular book/documentation generation system using Marimo notebooks. The architecture is mature, with clear separation of concerns between template management (`.rhiza/`), project tests (`tests/`), and documentation (`book/`, `docs/`).
+
+**Core Value Proposition:** Unlike one-time project generators (cookiecutter, copier), Rhiza provides continuous template synchronization via `.rhiza/template.yml` configuration, allowing projects to selectively pull updates over time while maintaining control.
+
+### Strengths
+
+#### Architecture & Organization
+- **Clean separation**: `.rhiza/` contains 16 modular `.mk` files (test.mk, book.mk, marimo.mk, docs.mk, etc.), each focused on a single concern
+- **Template bundles**: `template-bundles.yml` defines composable sets (core, tests, github, docker) for flexible template selection
+- **Multi-platform CI**: Parallel GitHub Actions (`.github/workflows/`) and GitLab CI (`.gitlab-ci.yml`) configurations with consistent job definitions
+- **Makefile design**: Double-colon rules allow hook extension (`pre-sync::`, `post-sync::`); color-coded output; help generation from inline comments
+
+#### Testing Infrastructure (7/10 - see detailed breakdown in prior entry)
+- **Property-based tests**: Hypothesis integration (`.rhiza/requirements/tests.txt` specifies `hypothesis>=6.150.0`) with dedicated `make hypothesis-test` target
+- **Stress testing**: `tests/stress/` directory with separate marker (`@pytest.mark.stress`) and `make stress` target
+- **Benchmark tests**: `tests/benchmarks/` using pytest-benchmark with HTML report generation (`make benchmark`)
+- **Test organization**: Structured into `tests/{property,stress,benchmarks,integration,api,structure,security,sync,deps,utils}` directories
+- **Comprehensive coverage**: HTML coverage reports (`_tests/html-coverage`), test reports (`_tests/html-report/report.html`), hypothesis reports (`_tests/hypothesis/report.html`)
+- **Fixture hierarchy**: `.rhiza/tests/conftest.py` provides `git_repo` and `setup_rhiza_git_repo()` for isolated test environments
+
+#### Hypothesis Testing Implementation
+- **Dedicated target**: `.rhiza/make.d/test.mk:103-118` defines `hypothesis-test` target with:
+  - `--hypothesis-show-statistics` flag for detailed output
+  - `--hypothesis-seed=0` for reproducibility
+  - `-m "hypothesis or property"` marker filtering
+  - HTML report generation to `_tests/hypothesis/report.html`
+- **Example test**: `tests/property/test_makefile_properties.py` demonstrates property-based testing with `@given(st.lists(st.integers() | st.floats()))`
+- **Documentation**: `docs/TESTS.md` (298 lines) provides comprehensive guidance on writing property-based tests, including best practices and troubleshooting
+- **CI integration**: Hypothesis tests run as part of `make test` (line 23-47 in test.mk) during GitHub Actions `rhiza_ci.yml` workflow
+
+#### Book & Documentation System
+- **Marimo notebooks**: `book/marimo/notebooks/rhiza.py` showcases interactive reactive notebooks with inline PEP 723 dependencies
+- **Marimushka export**: `.rhiza/make.d/marimo.mk:45-67` exports Marimo notebooks to static HTML using `marimushka>=0.3.3`
+- **Book compilation**: `.rhiza/make.d/book.mk:54-105` aggregates 7 sections (API, Coverage, Test Report, Benchmarks, Stress Tests, Notebooks, Official Documentation) into unified `_book/` via `minibook`
+- **Declarative sections**: `BOOK_SECTIONS` variable (lines 39-46) defines structure as pipe-delimited tuples with source/target paths
+- **MkDocs integration**: `.rhiza/make.d/docs.mk:72-82` builds MkDocs sites with Material theme (`mkdocs-material<10.0`)
+- **API docs**: `pdoc` integration with docformat detection (google/numpy/sphinx) from `ruff.toml` or fallback to google
+- **CI workflow**: `.github/workflows/rhiza_marimo.yml` discovers and validates all notebooks in parallel matrix jobs
+
+#### Test Report & HTML Generation
+- **HTML test reports**: `pytest-html>=4.0` dependency generates `_tests/html-report/report.html` (test.mk line 40, 47)
+- **Coverage HTML**: `--cov-report=html:_tests/html-coverage` generates browsable coverage reports (test.mk line 37)
+- **Benchmark reports**: Custom `rhiza-tools>=0.2.3 analyze-benchmarks` generates `_tests/benchmarks/report.html` (test.mk line 83)
+- **Hypothesis reports**: `--html=_tests/hypothesis/report.html` generates property-based test reports (test.mk line 117)
+- **Stress reports**: `--html=_tests/stress/report.html` for stress test results (test.mk line 148)
+- **Book aggregation**: All reports copied to `_book/tests/` structure for unified documentation site
+
+#### Dependency Management
+- **UV-first**: `uv>=0.10.4` used throughout; `UV_BIN` and `UVX_BIN` variables for consistency
+- **Version matrix**: `make version-matrix` uses `rhiza-tools>=0.2.2` to extract Python versions from `pyproject.toml` for CI matrix generation
+- **Constrained versions**: `pyproject.toml` dev dependencies use ranges (e.g., `marimo>=0.18.0,<1.0`) with rationale comments
+- **Security scanning**: `make security` runs `pip-audit` and `bandit` (test.mk lines 63-67)
+
+#### CI/CD Configuration
+- **GitHub Actions workflows**:
+  - `rhiza_ci.yml`: Matrix testing across Python 3.11-3.14, runs `make test`
+  - `rhiza_marimo.yml`: Discovers and validates Marimo notebooks in parallel
+  - `rhiza_validate.yml`: Validates Rhiza configuration
+  - `rhiza_release.yml`: Release automation
+- **GitLab CI**: `.gitlab-ci.yml` includes parallel workflows from `.gitlab/workflows/` with 4 stages (`.pre`, `build`, `test`, `deploy`)
+- **No hypothesis-specific workflow**: Property-based tests run as part of standard `make test` in CI (appropriate design decision)
+
+### Weaknesses
+
+#### Hypothesis Testing Gaps
+- **Missing dedicated CI workflow**: No separate GitHub Actions workflow for hypothesis testing (unlike benchmarks); runs only via `make test`
+- **Limited example coverage**: Only one example test (`test_sort_correctness_using_properties`) demonstrating hypothesis usage
+- **No project-specific property tests**: `tests/property/test_makefile_properties.py` tests generic Python behavior (list sorting), not Makefile/template-specific invariants
+- **Unused hypothesis-test target**: `make hypothesis-test` target exists but not invoked in CI workflows or documentation as standalone command
+- **No hypothesis configuration**: Missing `[tool.hypothesis]` section in `pyproject.toml` for project-wide settings (seed, max_examples, deadline)
+
+#### Book/Documentation Concerns
+- **Single notebook**: Only `book/marimo/notebooks/rhiza.py` exists; limited demonstration of Marimo capabilities for a showcase
+- **Complex book build**: `.rhiza/make.d/book.mk:54-105` has 52 lines of shell logic for section aggregation; could be refactored to Python script
+- **Missing sections handling**: `make book` silently skips missing sections with `[WARN]` messages; no validation that minimum required sections exist
+- **Template complexity**: `minibook` invocation (lines 97-102) requires Python one-liner to convert JSON, fragile if links.json malformed
+- **No book validation**: No tests verify book structure, links.json validity, or that all sections are generated correctly
+
+#### Testing & Report Generation
+- **HTML report dependencies**: `_tests/` directory not committed; reports lost between CI runs (expected, but no artifact upload in `rhiza_ci.yml`)
+- **Coverage threshold**: `COVERAGE_FAIL_UNDER ?= 90` but no source folder defined in `.rhiza/make.d/test.mk` (relies on `SOURCE_FOLDER` from parent Makefile)
+- **Benchmark CI missing**: No GitHub Actions workflow for benchmarks despite comprehensive `make benchmark` target and documentation
+- **Test isolation**: `tests/stress/` and `tests/benchmarks/` directories exist at project level but unclear if they test template or project code
+- **Duplicate test structure**: Both `tests/` and `.rhiza/tests/` hierarchies; `.rhiza/tests/` has 70+ files while `tests/` has only 5 files
+
+#### Directory Structure Confusion
+- **Overlapping test directories**: 
+  - `tests/property/` (project-level, 1 file)
+  - `tests/benchmarks/` (project-level, 2 files)
+  - `tests/stress/` (project-level, 2 files)
+  - `.rhiza/tests/api/`, `.rhiza/tests/integration/`, `.rhiza/tests/stress/`, etc. (template-level, 60+ files)
+- **Unclear ownership**: Documentation doesn't clearly explain when tests should go in `tests/` vs `.rhiza/tests/`
+- **Book structure**: `book/marimo/notebooks/` (only 1 notebook) suggests more planned but not present
+
+#### Makefile & Configuration
+- **Variable dependencies**: `SOURCE_FOLDER` referenced but not defined in `.rhiza/rhiza.mk` or `.rhiza/make.d/test.mk`; assumed to come from parent Makefile
+- **MARIMO_FOLDER undefined**: `marimo.mk` uses `${MARIMO_FOLDER}` but no default defined (assumed `marimo` from `rhiza_marimo.yml:45`)
+- **BOOK_TITLE/SUBTITLE undefined**: `book.mk:98-99` uses these variables without defaults or documentation
+- **Color code duplication**: Color variables (`BLUE`, `RED`, `GREEN`, etc.) defined in `.rhiza/rhiza.mk` lines 8-14 but potentially redefined elsewhere
+
+### Risks / Technical Debt
+
+#### Maintainability
+1. **Double test hierarchy complexity**: Maintaining parallel `tests/` and `.rhiza/tests/` increases cognitive load; new contributors unclear where to add tests
+2. **Shell complexity in Makefiles**: Book building, Marimushka export, docs generation all use complex shell scripting; difficult to debug and test
+3. **Implicit variable contracts**: Many Makefiles assume variables set elsewhere (SOURCE_FOLDER, MARIMO_FOLDER, BOOK_TITLE); brittle if parent Makefile changes
+4. **No Makefile tests**: Despite `tests/api/test_makefile_api.py` in `.rhiza/tests/`, no tests for `make book`, `make marimushka`, or `make hypothesis-test`
+
+#### CI/CD
+1. **No artifact preservation**: HTML reports generated but not uploaded as GitHub Actions artifacts; debugging CI failures requires local reproduction
+2. **Matrix explosion risk**: `rhiza_ci.yml` tests 4 Python versions; if more Make targets added, CI time will increase linearly
+3. **GitLab/GitHub drift**: Maintaining parallel CI definitions risks divergence; no validation they stay in sync
+4. **Hardcoded versions**: `uv@0.10.4` in workflows; if `.rhiza/.rhiza-version` changes, workflows may use stale version
+
+#### Documentation
+1. **Hypothesis testing underutilized**: Extensive infrastructure (`make hypothesis-test`, docs, dependencies) but minimal actual usage (1 example test)
+2. **Book system over-engineered**: 7 section types, custom minibook templates, complex aggregation—all for a repository with minimal source code
+3. **Missing getting-started**: `docs/TESTS.md` is comprehensive but lacks simple "run your first hypothesis test" tutorial
+4. **Stale documentation risk**: `docs/TESTS.md` documents hypothesis features extensively; if implementation changes, docs may not update
+
+#### Security & Quality
+1. **Mocked UV in tests**: `.rhiza/tests/conftest.py` likely mocks UV binary; integration tests may miss real UV behavior
+2. **Bandit exclusions**: `make security` runs bandit with `-c pyproject.toml` but no `[tool.bandit]` section visible in provided `pyproject.toml`
+3. **No dependency scanning**: `pip-audit` runs but no automated PR generation for vulnerabilities (unlike Renovate for version updates)
+4. **Template sync conflicts**: `make sync` uses `--force` flag (rhiza.mk:101); could overwrite local changes if not careful
+
+### Score
+
+**7/10** — Solid, production-ready template system with minor gaps in hypothesis testing utilization and documentation structure
+
+**Rationale:**
+- **+3**: Exceptional Makefile architecture, comprehensive CI/CD dual support, modular template system
+- **+2**: Strong testing infrastructure with property/stress/benchmark separation, detailed test documentation
+- **+1**: Innovative book compilation system, Marimo notebook integration, UV-first dependency management
+- **+1**: Security scanning, code quality enforcement, mature fixture design
+- **-1**: Hypothesis testing infrastructure underutilized (1 example test, no project-specific property tests)
+- **-1**: Overlapping test directory structure causes confusion (tests/ vs .rhiza/tests/)
+- **-1**: Shell complexity in Makefiles increases maintenance burden, no artifact preservation in CI
+- **-1**: Book system over-engineered for current content volume (1 notebook, minimal source code)
+
+**Would be 8-9 if:**
+- Project-specific hypothesis tests existed (e.g., property-based tests for template synchronization logic, Makefile target parsing)
+- Clear documentation explaining `tests/` vs `.rhiza/tests/` ownership model
+- Shell logic in book.mk, marimo.mk refactored to Python scripts with unit tests
+- GitHub Actions workflows upload HTML reports as artifacts
+- Hypothesis configuration defined in `pyproject.toml` with project-wide settings
+- Additional Marimo notebooks demonstrate real-world template usage patterns
+
+**Would be 9-10 if:**
+- Hypothesis tests coverage >30 tests across template logic, Makefile parsing, sync operations
+- Book system simplified or justified with substantial documentation content
+- Automated validation ensures GitHub/GitLab CI stay in sync
+- Integration tests use real UV rather than mocks (or document mock limitations)
+- `make book` validates minimum required sections before compilation
+
+**Current state appropriate for:** A mature template repository that prioritizes modularity, multi-platform support, and comprehensive documentation infrastructure. The hypothesis testing foundation is excellent but underutilized; adding 10-20 property-based tests for core template logic would significantly increase confidence. The dual test hierarchy (tests/ vs .rhiza/tests/) needs architectural documentation to clarify intent.
+