Skip to content

Redesign BacktestReport — Unified Single & Multi-Strategy HTML Dashboard #401

@MDUYN

Description

@MDUYN

Summary

Replace the current Plotly/Jinja2-based BacktestReport with a self-contained HTML dashboard that dynamically adapts to the data:

  • 1 strategy with N runs → single-strategy deep-dive (equity, drawdown, trades, risk, heatmap)
  • N strategies with N runs → comparison view (ranking table, metric bars, per-strategy drill-down)

The new API should work seamlessly with both live Backtest objects (from run_vector_backtests) and disk-loaded results.


Motivation

Current state (v3.7.3)

Limitation Detail
Single-run only _create_html_report() reads self.backtests[0] — the rest are ignored
No comparison view No way to visualise N strategies side-by-side
No multi-run awareness No concept of strategy → runs hierarchy
External deps Plotly + Jinja2 bloat the HTML (~3 MB per chart) and require CDN
Flat storage model _is_backtest() expects results.json + metrics.json at a single level — doesn't walk {strategy}/{runs}/{run_name}/
No run_vector_backtests (plural) Only run_vector_backtest (singular) exists — no batch API

New storage format (already in use)

backtest_results/
└── batch_1/                          # ← BacktestReport.open() entry point
    ├── 04a159e7/                     # strategy (hash or name)
    │   ├── algorithm_id.json         # {"algorithm_id": "..."}
    │   ├── summary.json              # aggregated metrics across runs
    │   ├── risk_free_rate.json
    │   ├── metadata.json             # optional strategy metadata
    │   └── runs/
    │       ├── backtest_EUR_20240101_20241231/
    │       │   ├── metrics.json      # BacktestMetrics serialised
    │       │   └── run.json          # BacktestResult serialised (trades, etc.)
    │       ├── backtest_EUR_20240331_20250331/
    │       │   ├── metrics.json
    │       │   └── run.json
    │       └── ...
    ├── 1a9ecb38/
    │   └── ...
    └── ...

This issue proposes changes across three areas:

  1. New batch APIapp.run_vector_backtests() (plural)
  2. Redesigned BacktestReport — new constructor, __getitem__, unified HTML generator
  3. Self-contained HTML dashboard — zero external deps, canvas-based charts, dark/light theme

Proposed API

1. From live objects (after run_vector_backtests)

from investing_algorithm_framework import BacktestReport

backtests = app.run_vector_backtests(
    strategies=strategies,
    backtest_date_ranges=backtest_windows,
    initial_amount=1000,
)
# Returns: Dict[str, List[Backtest]]
#   key   = strategy identifier (algorithm_id or hash)
#   value = list of Backtest objects (one per date range)

# ── Single strategy (with all its runs) ──
report = BacktestReport(backtests["04a159e7"])
report.show()                          # inline in Jupyter
report.show(browser=True)              # opens in default browser
report.save("single_strategy.html")    # save to file

# ── Compare multiple strategies ──
report = BacktestReport(backtests)     # pass the full dict
report.show()                          # comparison dashboard

2. From disk (reload in a new session)

# Load all strategies from a batch → comparison view
report = BacktestReport.open("./backtest_results/batch_1/")
report.show()

# Load a single strategy → single-strategy view
report = BacktestReport.open("./backtest_results/batch_1/04a159e7/")
report.show()

3. Drill-down from comparison to single

report = BacktestReport.open("./backtest_results/batch_1/")
report.show()                          # comparison of all 12 strategies

single = report["04a159e7"]            # select one strategy
single.show()                          # single-strategy dashboard

Implementation Plan

A. New method: app.run_vector_backtests() (plural)

# app/app.py

def run_vector_backtests(
    self,
    strategies: List[TradingStrategy],
    backtest_date_ranges: List[BacktestDateRange],
    initial_amount: float = 1000,
    snapshot_interval: SnapshotInterval = SnapshotInterval.DAILY,
    risk_free_rate: Optional[float] = None,
    output_directory: Optional[str] = None,
) -> Dict[str, List[Backtest]]:
    """
    Run vectorised backtests for multiple strategies across multiple
    date ranges.

    Returns a dict keyed by strategy identifier, where each value is
    a list of Backtest objects (one per date range).

    If output_directory is provided, results are persisted to disk in
    the hierarchical format:
        {output_directory}/{strategy_id}/runs/{run_name}/
    """
    results: Dict[str, List[Backtest]] = {}

    for strategy in strategies:
        strategy_id = strategy.algorithm_id  # or hash
        strategy_backtests = []

        for date_range in backtest_date_ranges:
            backtest = self.run_vector_backtest(
                strategy=strategy,
                backtest_date_range=date_range,
                initial_amount=initial_amount,
                snapshot_interval=snapshot_interval,
                risk_free_rate=risk_free_rate,
            )
            strategy_backtests.append(backtest)

        results[strategy_id] = strategy_backtests

        if output_directory:
            self._save_strategy_backtests(
                strategy_id, strategy_backtests, output_directory
            )

    return results

B. Redesigned BacktestReport

# app/reporting/backtest_report.py

@dataclass
class BacktestReport:
    """
    Unified backtest report that adapts to the data:
      - 1 strategy  → single-strategy dashboard (runs as pages)
      - N strategies → comparison dashboard (strategies as pages)
    """

    # Internal: dict of {strategy_id: {"summary": dict, "runs": [...]}}
    _strategies: Dict[str, dict] = field(default_factory=dict)
    _html: str = None

    def __init__(self, backtests):
        """
        Accept multiple input shapes:
          - List[Backtest]         → single strategy, multiple runs
          - Dict[str, List[Backtest]] → multiple strategies
        """
        if isinstance(backtests, list):
            # Single strategy — infer ID from first backtest's metadata
            strategy_id = self._infer_strategy_id(backtests)
            self._strategies[strategy_id] = self._build_strategy_entry(backtests)

        elif isinstance(backtests, dict):
            for strategy_id, bt_list in backtests.items():
                self._strategies[strategy_id] = self._build_strategy_entry(bt_list)

    @staticmethod
    def open(directory_path: str) -> "BacktestReport":
        """
        Load from the hierarchical disk format.

        If directory_path points to a single strategy (has summary.json),
        load a single-strategy report.

        If it contains subdirectories with summary.json, load all as a
        comparison report.
        """
        strategies = {}

        if os.path.isfile(os.path.join(directory_path, "summary.json")):
            # Single strategy directory
            strategy_id, entry = BacktestReport._load_strategy_dir(directory_path)
            strategies[strategy_id] = entry
        else:
            # Batch directory — scan for strategy subdirectories
            for name in sorted(os.listdir(directory_path)):
                subdir = os.path.join(directory_path, name)
                if os.path.isdir(subdir) and os.path.isfile(
                    os.path.join(subdir, "summary.json")
                ):
                    strategy_id, entry = BacktestReport._load_strategy_dir(subdir)
                    strategies[strategy_id] = entry

        if not strategies:
            raise OperationalException(
                f"No valid backtest data found in {directory_path}"
            )

        report = BacktestReport.__new__(BacktestReport)
        report._strategies = strategies
        report._html = None
        return report

    def __getitem__(self, strategy_id: str) -> "BacktestReport":
        """
        Drill-down: select a single strategy from a comparison report.
        Returns a new BacktestReport with just that strategy.
        """
        if strategy_id not in self._strategies:
            raise KeyError(f"Strategy '{strategy_id}' not found. "
                           f"Available: {list(self._strategies.keys())}")
        report = BacktestReport.__new__(BacktestReport)
        report._strategies = {strategy_id: self._strategies[strategy_id]}
        report._html = None
        return report

    @property
    def is_single_strategy(self) -> bool:
        return len(self._strategies) == 1

    @property
    def strategy_ids(self) -> list:
        return list(self._strategies.keys())

    def show(self, browser: bool = False):
        """Display the dashboard inline (Jupyter) or in the browser."""
        if not self._html:
            self._html = self._generate_html()

        if self._in_jupyter():
            from IPython.display import display, HTML
            display(HTML(self._html))
        else:
            browser = True

        if browser:
            import tempfile, webbrowser
            path = os.path.join(tempfile.gettempdir(), "backtest_report.html")
            with open(path, "w") as f:
                f.write(self._html)
            webbrowser.open(f"file://{path}")

    def save(self, path: str):
        """Save the HTML dashboard to a file."""
        if not self._html:
            self._html = self._generate_html()
        with open(path, "w") as f:
            f.write(self._html)

    def _generate_html(self) -> str:
        """
        Generate the self-contained HTML dashboard.
        Uses canvas-based charts — zero external dependencies.
        Adapts layout based on len(self._strategies).
        """
        # → calls the unified HTML generator (see section C)
        ...

    @staticmethod
    def _build_strategy_entry(backtests: List[Backtest]) -> dict:
        """Convert a list of Backtest objects into the internal format."""
        runs = []
        for bt in backtests:
            runs.append({
                "name": _derive_run_name(bt),
                "metrics": bt.backtest_metrics.to_dict(),
                "results": bt.backtest_results.to_dict(),
            })
        return {
            "summary": _aggregate_summary(runs),
            "runs": runs,
        }

    @staticmethod
    def _load_strategy_dir(directory_path: str):
        """Load a strategy from disk, returning (strategy_id, entry)."""
        with open(os.path.join(directory_path, "summary.json")) as f:
            summary = json.load(f)

        algo_id_path = os.path.join(directory_path, "algorithm_id.json")
        if os.path.isfile(algo_id_path):
            with open(algo_id_path) as f:
                strategy_id = json.load(f).get("algorithm_id", os.path.basename(directory_path))
        else:
            strategy_id = os.path.basename(directory_path)

        runs = []
        runs_dir = os.path.join(directory_path, "runs")
        if os.path.isdir(runs_dir):
            for run_name in sorted(os.listdir(runs_dir)):
                run_path = os.path.join(runs_dir, run_name)
                metrics_path = os.path.join(run_path, "metrics.json")
                results_path = os.path.join(run_path, "run.json")
                if os.path.isfile(metrics_path):
                    with open(metrics_path) as f:
                        metrics = json.load(f)
                    results = {}
                    if os.path.isfile(results_path):
                        with open(results_path) as f:
                            results = json.load(f)
                    runs.append({
                        "name": run_name,
                        "metrics": metrics,
                        "results": results,
                    })

        return strategy_id, {"summary": summary, "runs": runs}

    @staticmethod
    def _in_jupyter() -> bool:
        try:
            return get_ipython().__class__.__name__ == "ZMQInteractiveShell"
        except (NameError, ImportError):
            return False

C. Self-contained HTML dashboard

The HTML generator is already implemented as a working prototype in _gen_unified_dashboard.py. It produces a zero-dependency, self-contained HTML file with:

Feature Single-Strategy Mode Multi-Strategy Mode
Sidebar Overview + individual runs Overview + strategy names
Overview Summary KPIs, runs table, equity overlay (€) Ranking table with run-view dropdown, normalized equity overlay (%), 2×2 metric bars
Detail pages 4 tabs: Overview · Performance · Trades · Risk 3 tabs: Summary · Runs · Performance
Trades Sortable table, donut by symbol, P&L bar Summary metrics only
Risk Rolling Sharpe (252d), underwater equity
Charts Canvas-based, no external JS libs Canvas-based, no external JS libs
Theme Dark/light toggle Dark/light toggle
Finterion Sponsor page Sponsor page

The HTML adapts dynamically at render time based on STRATEGIES.length === 1.


Changes to Existing Code

Files to modify

File Change
app/app.py Add run_vector_backtests() method
app/reporting/backtest_report.py Rewrite class (new constructor, __getitem__, unified HTML gen)
app/reporting/templates/ Remove Jinja2 templates (no longer needed)
domain/backtesting/backtest.py No changes required
domain/backtesting/backtest_metrics.py No changes required

Files to add

File Purpose
app/reporting/html_generator.py Port of _gen_unified_dashboard.py — pure-Python HTML string builder

Dependencies to remove

Package Reason
plotly Replaced by canvas-based charts
jinja2 Replaced by Python f-string template

Backward compatibility

Concern Mitigation
BacktestReport(backtests=[...]) (current kwarg API) Support for 1 release via deprecation warning; migrate callers to positional arg
BacktestReport.open(backtests=[], directory_path=None) Keep directory_path kwarg; drop backtests kwarg (use constructor instead)
_is_backtest() checks for results.json Update to also accept run.json
_create_html_report() Replaced by _generate_html()

Storage Format Validation

The _is_backtest check needs updating. Currently:

# Current — only works with flat format
@staticmethod
def _is_backtest(path):
    return (
        os.path.isfile(os.path.join(path, "results.json"))
        and os.path.isfile(os.path.join(path, "metrics.json"))
    )

Proposed detection logic:

@staticmethod
def _is_strategy_dir(path):
    """A strategy dir has summary.json and a runs/ subdirectory."""
    return (
        os.path.isdir(path)
        and os.path.isfile(os.path.join(path, "summary.json"))
    )

@staticmethod
def _is_run_dir(path):
    """A run dir has metrics.json (and optionally run.json or results.json)."""
    return (
        os.path.isdir(path)
        and os.path.isfile(os.path.join(path, "metrics.json"))
    )

Field Name Normalisation

Two naming conventions exist in serialised metrics.json files:

Metric Convention A (event backtest) Convention B (vector backtest)
Equity curve equity_curve equity
Drawdown series drawdown_series drawdown
Cumulative return cumulative_return_series cumulative_return
Monthly returns monthly_returns monthly_return

The HTML generator should normalise on load:

eq = metrics.get("equity_curve") or metrics.get("equity", [])
dd = metrics.get("drawdown_series") or metrics.get("drawdown", [])

Long-term, BacktestMetrics.to_dict() should standardise the keys.


Acceptance Criteria

  • BacktestReport(list_of_backtests) produces a single-strategy dashboard
  • BacktestReport(dict_of_backtests) produces a comparison dashboard
  • BacktestReport.open(strategy_dir) loads a single strategy from disk
  • BacktestReport.open(batch_dir) loads N strategies from disk
  • report["strategy_id"] returns a new single-strategy report
  • report.show() renders inline in Jupyter
  • report.show(browser=True) opens in the default browser
  • report.save("path.html") writes a self-contained HTML file
  • HTML has zero external dependencies (no Plotly, no CDN, no Jinja2)
  • HTML adapts layout dynamically for 1 vs N strategies
  • Dark/light theme toggle works
  • app.run_vector_backtests() returns Dict[str, List[Backtest]]
  • app.run_vector_backtests(output_directory=...) persists to disk
  • Both field name conventions (equity_curve / equity) are handled

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions