Skip to content

Bug: Drawdown metrics do not match portfolio snapshot data #407

@MDUYN

Description

@MDUYN

Summary

The backtesting framework reports drawdown metrics that are inconsistent with the total_value series in the portfolio_snapshots it produces. Independently computing max drawdown from the snapshot total_value field yields significantly different results.

Expected values (from framework)

{
  "max_drawdown": 0.1384753616852372,
  "max_drawdown_absolute": 406.69151474500177,
  "max_daily_drawdown": 0.1384753616852372,
  "max_drawdown_duration": 240
}

Actual values (computed from portfolio_snapshots[].total_value)

Metric Framework reports Computed from snapshots Delta
max_drawdown 13.85% 9.19% +4.66pp
max_drawdown_absolute 406.69 230.50 +176.19
max_daily_drawdown 13.85% 7.84% +6.01pp
max_drawdown_duration 240 days 241 days -1 day

Analysis

1. Framework uses a different equity curve than total_value

The framework's absolute drawdown (406.69) divided by its fractional drawdown (0.1385) implies a peak equity of ~2,937. However, the actual peak total_value in the snapshots is ~2,508. This means the framework is computing drawdown from a different equity series than the one stored in total_value.

Possible causes:

  • The framework may be summing fields differently (e.g. unallocated + pending_value + unrealized instead of using the pre-computed total_value).
  • The framework may be revaluing positions at current market prices independently of the snapshot, producing a different equity curve.
  • There may be a mismatch between the equity curve used internally for metrics and the one serialized to portfolio_snapshots.

2. max_daily_drawdown equals max_drawdown — likely a bug

The framework reports max_daily_drawdown = 0.1384753616852372, which is identical to max_drawdown. This is almost certainly wrong:

  • max_daily_drawdown should represent the largest single-period (day-to-day) decline, which is typically much smaller than the peak-to-trough drawdown.
  • From the snapshot data, the largest single-day drop is 7.84%, not 13.85%.
  • If max_daily_drawdown truly equals max_drawdown, it would mean the entire 13.85% drawdown happened in a single snapshot interval — contradicting the reported max_drawdown_duration of 240 days.

Likely cause: max_daily_drawdown is being assigned the same value as max_drawdown instead of being computed independently as the worst single-period return.

3. max_drawdown_duration is close but off by 1 day

The duration (240 vs 241 days) is within rounding tolerance and may be an off-by-one in how the framework counts the start/end day (inclusive vs exclusive). This is minor.

How to reproduce

Using backtest_run_three.json:

import json
from datetime import datetime

with open("backtest_run_three.json") as f:
    data = json.load(f)

snaps = sorted(data["portfolio_snapshots"], key=lambda s: s["created_at"])

# Max drawdown from total_value
peak = 0
max_dd = 0
max_dd_abs = 0
for s in snaps:
    tv = s["total_value"]
    if tv > peak:
        peak = tv
    if peak > 0:
        dd = (peak - tv) / peak
        if dd > max_dd:
            max_dd = dd
            max_dd_abs = peak - tv

print(f"max_drawdown:          {max_dd}")       # 0.0919 — NOT 0.1385
print(f"max_drawdown_absolute: {max_dd_abs}")    # 230.50 — NOT 406.69

Suggested fix

  1. Verify that the equity curve used for drawdown calculation matches the total_value written to portfolio_snapshots. If they diverge, either fix the metric calculation or fix the snapshot serialization.
  2. Fix max_daily_drawdown to compute the worst single-period return independently:
    max_daily_dd = max(
        (snaps[i-1]["total_value"] - snaps[i]["total_value"]) / snaps[i-1]["total_value"]
        for i in range(1, len(snaps))
        if snaps[i-1]["total_value"] > 0
    )
  3. Review the off-by-one in max_drawdown_duration (inclusive vs exclusive day counting).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions