Skip to content

Commit 59e2977

Browse files
author
Tonny@Home
committed
Fix prediction/backtest inconsistencies in ensemble scripts through improved data alignment and normalization logic, and update multilingual documentation.
1 parent 4deabf5 commit 59e2977

File tree

7 files changed

+54
-11
lines changed

7 files changed

+54
-11
lines changed

docs/02_BRUTE_FORCE_GUIDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ python quantpits/scripts/brute_force_ensemble.py --use-groups --group-config con
5151
- **权重优化**:对 Top 10 单模型做 Max Sharpe / Risk Parity 优化对比
5252
- **综合报告**:自动输出最佳组合、MVP 核心模型
5353

54+
> [!NOTE]
55+
> **关于单模型表现与融合回测的评测差异说明**
56+
>
57+
> 融合与穷举脚本在评估模型表现时,引入了严格的 **Z-Score 归一化**(Z-Score Normalization)和 **数据对齐**(Data Alignment)处理,因此由于 TopK 截断的存在,单模型在此处的回测结果可能与训练期间通过 `run_analysis.py` 查看到的原始预测分值回测结果存在合理且微小的差异:
58+
> 1. **独立归一化隔离**:每个模型的预测分值会首先仅基于自身非为空的预测股票池进行按天的 Z-Score 归一化处理。这保证了模型之间的评分尺度统一,且某个模型的数据缺失不会影响并在归一化前污染其他模型的分布。
59+
> 2. **延迟交集对齐**:仅在计算最终特定组合的均值或加权打分时,系统才会对当前组合涉及的模型取交集(即执行 `dropna(how='any')`),这避免了无关模型的数据缺失引发当前组合评测池的不当缩水。
60+
> 3. **评估排名的对齐**:所有提供参照的基准数据(如单模型的历史排行榜回测指标)均会严格根据当前评价矩阵实际生成的时间窗口进行动态切片对齐,从而为您提供“同时间段”的一致性比对。
61+
5462
## 完整参数列表
5563

5664
| 参数 | 默认值 | 说明 |

docs/03_ENSEMBLE_FUSION_GUIDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,14 @@ output/
193193
> [!TIP]
194194
> Default combo 会额外保存一份不带 combo 名的 `ensemble_{date}.csv`,确保向后兼容 `order_gen.py` 等下游脚本。
195195
196+
> [!NOTE]
197+
> **关于单模型表现与融合回测的评测差异说明**
198+
>
199+
> 融合与穷举脚本在评估模型表现时,引入了严格的 **Z-Score 归一化**(Z-Score Normalization)和 **数据对齐**(Data Alignment)处理,因此由于 TopK 截断的存在,单模型在此处的回测结果可能与训练期间通过 `run_analysis.py` 查看到的原始预测分值回测结果存在合理且微小的差异:
200+
> 1. **独立归一化隔离**:每个模型的预测分值会首先仅基于自身非为空的预测股票池进行按天的 Z-Score 归一化处理。这保证了模型之间的评分尺度统一,且某个模型的数据缺失不会影响并在归一化前污染其他模型的分布。
201+
> 2. **延迟交集对齐**:仅在计算最终特定组合的均值或加权打分时,系统才会对当前组合涉及的模型取交集(即执行 `dropna(how='any')`),这避免了无关模型的数据缺失引发当前组合评测池的不当缩水。
202+
> 3. **评估排名的对齐**:所有提供参照的基准数据(如单模型的历史排行榜回测指标)均会严格根据当前评价矩阵实际生成的时间窗口进行动态切片对齐,从而为您提供“同时间段”的一致性比对。
203+
196204
## 典型工作流
197205

198206
```bash

docs/en/02_BRUTE_FORCE_GUIDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ python quantpits/scripts/brute_force_ensemble.py --use-groups --group-config con
5151
- **Weight Optimization**: Comparative trials on Top 10 single models simulating Max Sharpe / Risk Parity optimization mappings.
5252
- **Comprehensive Reporting**: Generates autonomous summaries of MVP models and superior fusions.
5353

54+
> [!NOTE]
55+
> **Understanding Metric Discrepancies: Single Models vs. Ensemble Backtests**
56+
>
57+
> When evaluating model performance within fusion and brute-force architectures, strict **Z-Score Normalization** and **Data Alignment** processing govern the engine. Therefore, because of TopK position bounding, backtest results of a single model here may exhibit reasonable, micro-level disparities from the raw metrics evaluated naturally post-training (e.g. via `run_analysis.py`):
58+
> 1. **Isolated Normalization**: Each model calculates its daily cross-sectional Z-scores purely on its *own* non-null predicted universe. Scaling remains mathematically uniform, and a single model's signal scale cannot be skewed by other models' data coverage gaps prior to scoring.
59+
> 2. **Delayed Intersection**: Strict intersection dropping (`dropna(how='any')`) is executed strictly at the exact combo scoring phase and is limited precisely to the subset of models within that specific combo iteration. This guarantees irrelevant sub-models don't unilaterally shrink the evaluated combination universe.
60+
> 3. **Benchmarking Alignment**: The sub-model evaluation leaderboard dynamically slices historical records to match the precise temporal boundaries established by the current ensemble matrix index. This constructs a perfect "apples-to-apples" comparison avoiding overlapping timeframe distortion.
61+
5462
## Full Parameter List
5563

5664
| Parameter | Default | Description |

docs/en/03_ENSEMBLE_FUSION_GUIDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,14 @@ output/
193193
> [!TIP]
194194
> The `default` combo will redundantly output a nameless `ensemble_{date}.csv` artifact, guaranteeing absolute zero-modification compliance for downstream utilities like `order_gen.py`.
195195
196+
> [!NOTE]
197+
> **Understanding Metric Discrepancies: Single Models vs. Ensemble Backtests**
198+
>
199+
> When evaluating model performance within fusion and brute-force architectures, strict **Z-Score Normalization** and **Data Alignment** processing govern the engine. Therefore, because of TopK position bounding, backtest results of a single model here may exhibit reasonable, micro-level disparities from the raw metrics evaluated naturally post-training (e.g. via `run_analysis.py`):
200+
> 1. **Isolated Normalization**: Each model calculates its daily cross-sectional Z-scores purely on its *own* non-null predicted universe. Scaling remains mathematically uniform, and a single model's signal scale cannot be skewed by other models' data coverage gaps prior to scoring.
201+
> 2. **Delayed Intersection**: Strict intersection dropping (`dropna(how='any')`) is executed strictly at the exact combo scoring phase and is limited precisely to the subset of models within that specific combo iteration. This guarantees irrelevant sub-models don't unilaterally shrink the evaluated combination universe.
202+
> 3. **Benchmarking Alignment**: The sub-model evaluation leaderboard dynamically slices historical records to match the precise temporal boundaries established by the current ensemble matrix index. This constructs a perfect "apples-to-apples" comparison avoiding overlapping timeframe distortion.
203+
196204
## Typical Operations Sequence
197205

198206
```bash

quantpits/scripts/brute_force_ensemble.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -193,13 +193,14 @@ def load_predictions(train_records):
193193
if not all_preds:
194194
raise ValueError("未加载到任何预测数据!")
195195

196-
# 合并 & Z-Score 归一化
197-
merged_df = pd.concat(all_preds, axis=1).dropna()
196+
# 合并 & Z-Score 归一化 (注意:不在这里 dropna,避免模型之间由于股票池差异互相削减样本)
197+
merged_df = pd.concat(all_preds, axis=1)
198198
print(f"合并后数据维度: {merged_df.shape}")
199199

200200
norm_df = pd.DataFrame(index=merged_df.index)
201201
for col in merged_df.columns:
202-
norm_df[col] = zscore_norm(merged_df[col])
202+
# 独立依靠单模型自身非空范围进行 Z-Score
203+
norm_df[col] = zscore_norm(merged_df[col].dropna())
203204

204205
return norm_df, model_metrics
205206

@@ -386,8 +387,8 @@ def run_single_backtest(
386387
if bt_config is None:
387388
bt_config = strategy.get_backtest_config(st_config)
388389

389-
# 1. 合成信号 (等权均值,归一化后的)
390-
combo_score = norm_df[list(combo_models)].mean(axis=1)
390+
# 1. 合成信号 (等权均值,归一化后的) (仅在当前组合子集上求交集 dropna)
391+
combo_score = norm_df[list(combo_models)].dropna(how='any').mean(axis=1)
391392

392393
# 2. 准备组件
393394
# 注意: Account 必须每次新建,不能复用 (状态会累积)

quantpits/scripts/ensemble_fusion.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -229,13 +229,13 @@ def load_selected_predictions(train_records, selected_models):
229229
if not all_preds:
230230
raise ValueError("未加载到任何预测数据!")
231231

232-
# 合并 & Z-Score 归一化
233-
merged_df = pd.concat(all_preds, axis=1).dropna()
232+
# 合并 & Z-Score 归一化 (注意:不要在这里提前 dropna,避免由于单模型标的变动殃及其他模型)
233+
merged_df = pd.concat(all_preds, axis=1)
234234
print(f"合并后数据维度: {merged_df.shape}")
235235

236236
norm_df = pd.DataFrame(index=merged_df.index)
237237
for col in merged_df.columns:
238-
norm_df[col] = zscore_norm(merged_df[col])
238+
norm_df[col] = zscore_norm(merged_df[col].dropna())
239239

240240
return norm_df, model_metrics, loaded_models
241241

@@ -1000,16 +1000,23 @@ def risk_analysis_and_leaderboard(report_df, norm_df, train_records,
10001000
freq_suffix = '1week' if freq_val == 'week' else '1day'
10011001
report_filename = f"portfolio_analysis/report_normal_{freq_suffix}.pkl"
10021002

1003+
# 从当前 combo_norm_df 获取评价区间,以便子模型评估对齐
1004+
eval_start = str(norm_df.index.get_level_values('datetime').min().date())
1005+
eval_end = str(norm_df.index.get_level_values('datetime').max().date())
1006+
10031007
for model_name in loaded_models:
10041008
record_id = models.get(model_name)
10051009
if not record_id:
10061010
continue
10071011
try:
10081012
recorder = R.get_recorder(recorder_id=record_id, experiment_name=experiment_name)
10091013
hist_report = recorder.load_object(report_filename)
1014+
1015+
# 裁剪历史报告到当前的评价区间
1016+
hist_report = hist_report[(hist_report.index >= pd.to_datetime(eval_start)) & (hist_report.index <= pd.to_datetime(eval_end))]
10101017
all_reports[model_name] = hist_report
10111018

1012-
if 'return' in hist_report.columns:
1019+
if 'return' in hist_report.columns and not hist_report.empty:
10131020
# Up-sample sub-model report for consistent metric calculation
10141021
sub_da_df = pd.DataFrame(index=hist_report.index)
10151022
sub_da_df['收盘价值'] = hist_report['account']
@@ -1294,7 +1301,7 @@ def run_single_combo(combo_name, selected_models, method, manual_weights_str,
12941301
print(f"Warning: combo {combo_name} 没有有效模型,跳过")
12951302
return None
12961303

1297-
combo_norm_df = norm_df[combo_models]
1304+
combo_norm_df = norm_df[combo_models].dropna(how='any')
12981305
combo_metrics = {m: model_metrics.get(m, 0) for m in combo_models}
12991306

13001307
# ---- Stage 2: 相关性分析 ----

tests/quantpits/scripts/test_ensemble_fusion.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -419,10 +419,13 @@ def test_risk_analysis_and_leaderboard(mock_R, mock_env, tmp_path):
419419
mock_recorder.load_object.return_value = report_df
420420
mock_R.get_recorder.return_value = mock_recorder
421421

422+
idx = pd.MultiIndex.from_tuples([(pd.Timestamp("2020-01-01"), "A"), (pd.Timestamp("2020-01-02"), "A")], names=["datetime", "instrument"])
423+
norm_df = pd.DataFrame({"M1": [0.5, 0.6]}, index=idx)
424+
422425
with patch('quantpits.scripts.ensemble_fusion.calculate_safe_risk') as mock_risk:
423426
mock_risk.return_value = {"annualized_return": 0.5}
424427
reports, lb = ef.risk_analysis_and_leaderboard(
425-
report_df, None, train_records, ["M1"], "day", str(out_dir), "2020-01-01"
428+
report_df, norm_df, train_records, ["M1"], "day", str(out_dir), "2020-01-01"
426429
)
427430

428431
assert "Ensemble" in reports

0 commit comments

Comments
 (0)