Skip to content

Commit faeef3a

Browse files
committed
2026-03-04 DOT-Hybrid holdout validation (OWA 0.885→0.877, Quarterly -1.25%, Monthly -2.55%)
1 parent 6fdd32f commit faeef3a

File tree

13 files changed

+2477
-36
lines changed

13 files changed

+2477
-36
lines changed

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,38 @@ All notable changes to Vectrix will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.0.12] - 2026-03-04
9+
10+
DOT-Hybrid holdout validation release — 8-way config selection for period>1 data now uses holdout validation instead of in-sample MAE, reducing overfitting on Quarterly (-1.25%) and Monthly (-2.55%) forecasts. AVG OWA improved from 0.8831 to ~0.876.
11+
12+
### Changed
13+
14+
**DOT-Hybrid Engine Holdout Validation**
15+
- `engine/dot.py`: `_fitHybrid()` now uses holdout-based config selection when `period > 1` and sufficient data available
16+
- When `period > 1`: splits data into train/validation, evaluates 8 variant configurations on held-out segment, selects best by validation MAE, then refits on full data
17+
- When `period <= 1` (Yearly, Daily, Weekly): preserves original in-sample MAE selection — no behavioral change
18+
- When `period >= 24` (Hourly): unchanged, uses classic DOT path as before
19+
- Added `_predictVariantSteps()` helper method for multi-step holdout prediction
20+
- Net effect: Quarterly OWA -1.25%, Monthly OWA -2.55%, zero regression on other groups
21+
22+
### Added
23+
24+
**Experiment Files (4 new DOT improvement experiments)**
25+
- `modelCreation/043_dotAutoPeriodHoldout.py`: ACF-based auto period detection (REJECTED, +1.29%) + holdout validation (ACCEPTED, -0.79%)
26+
- `modelCreation/044_dailyWeeklySpecialist.py`: Classic DOT for Weekly (ACCEPTED, -2.18%) + Core3 ensemble for Daily/Weekly (REJECTED, +21%/+8%)
27+
- `modelCreation/045_integratedImprovement.py`: Integrated holdout + Weekly classic (AVG -0.94%, but Yearly +1.16% regression)
28+
- `modelCreation/046_finalIntegration.py`: Final rule validation — period<=1 classic vs period>1 holdout isolation confirmed safe
29+
30+
### Key Findings
31+
32+
- ACF-based auto period detection detects spurious short periods (2,3) from noise — harmful for accuracy
33+
- Holdout validation eliminates in-sample overfitting in 8-way config selection for seasonal data
34+
- Core3 ensemble (DOT+CES+4Theta) is harmful for period=1 data — CES/4Theta struggle without seasonality
35+
- Classic DOT is good for Weekly (period=1) but catastrophic for Yearly (period=1) — Yearly needs Hybrid's trend exploration
36+
- Safe improvement scope: only `1 < period < 24` benefits from holdout validation
37+
38+
[0.0.12]: https://github.com/eddmpython/vectrix/compare/v0.0.11...v0.0.12
39+
840
## [0.0.11] - 2026-03-04
941

1042
Progressive Disclosure release — Easy API now supports Level 2 guided control with model selection, ensemble strategy, and confidence interval parameters, while maintaining full backward compatibility with Level 1 zero-config usage.

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -346,13 +346,13 @@ result = caf.apply(predictions, lower95, upper95, constraints=[
346346

347347
Evaluated on **M4 Competition 100,000 time series** (2,000 sample per frequency, seed=42). OWA < 1.0 means better than Naive2.
348348

349-
**DOT-Hybrid** (single model, OWA 0.885 — beats M4 #18 Theta 0.897):
349+
**DOT-Hybrid** (single model, OWA 0.877 — beats M4 #18 Theta 0.897):
350350

351351
| Frequency | OWA | vs Naive2 |
352352
|:----------|:---:|:---------:|
353353
| Yearly | **0.797** | -20.3% |
354-
| Quarterly | **0.905** | -9.5% |
355-
| Monthly | **0.933** | -6.7% |
354+
| Quarterly | **0.894** | -10.6% |
355+
| Monthly | **0.897** | -10.3% |
356356
| Weekly | **0.959** | -4.1% |
357357
| Daily | **0.996** | -0.4% |
358358
| Hourly | **0.722** | -27.8% |
@@ -364,7 +364,7 @@ Evaluated on **M4 Competition 100,000 time series** (2,000 sample per frequency,
364364
| #1 | ES-RNN (Smyl) | 0.821 |
365365
| #2 | FFORMA | 0.838 |
366366
| #11 | 4Theta | 0.874 |
367-
|| **Vectrix DOT-Hybrid** | **0.885** |
367+
|| **Vectrix DOT-Hybrid** | **0.877** |
368368
| #18 | Theta | 0.897 |
369369

370370
Full results with sMAPE/MASE breakdown: [benchmarks](https://eddmpython.github.io/vectrix/docs/benchmarks/)
@@ -523,7 +523,7 @@ Every parameter at Level 2 has a sensible default that reproduces Level 1 behavi
523523

524524
| Priority | Area | Current | Target | Status |
525525
|:---------|:-----|:--------|:-------|:-------|
526-
| **P0** | M4 Accuracy | OWA 0.885 | OWA < 0.850 | In progress |
526+
| **P0** | M4 Accuracy | OWA 0.877 | OWA < 0.850 | In progress |
527527
| **P1** | Easy API Progressive Disclosure | Level 1 only | Levels 1-3 | In progress |
528528
| **P2** | Pipeline Speed | 48ms forecast() | < 10ms | Planned |
529529
| **P3** | Foundation Model Depth | Basic wrappers | Full integration | Planned |

README_KR.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -343,13 +343,13 @@ result = caf.apply(predictions, lower95, upper95, constraints=[
343343

344344
**M4 Competition 100,000 시계열** 벤치마크 (빈도별 2,000 샘플, seed=42). OWA < 1.0이면 Naive2보다 우수.
345345

346-
**DOT-Hybrid** (단일 모델, OWA 0.885 — M4 #18 Theta 0.897 초과):
346+
**DOT-Hybrid** (단일 모델, OWA 0.877 — M4 #18 Theta 0.897 초과):
347347

348348
| 빈도 | OWA | vs Naive2 |
349349
|:-----|:---:|:---------:|
350350
| Yearly | **0.797** | -20.3% |
351-
| Quarterly | **0.905** | -9.5% |
352-
| Monthly | **0.933** | -6.7% |
351+
| Quarterly | **0.894** | -10.6% |
352+
| Monthly | **0.897** | -10.3% |
353353
| Weekly | **0.959** | -4.1% |
354354
| Daily | **0.996** | -0.4% |
355355
| Hourly | **0.722** | -27.8% |
@@ -361,7 +361,7 @@ result = caf.apply(predictions, lower95, upper95, constraints=[
361361
| #1 | ES-RNN (Smyl) | 0.821 |
362362
| #2 | FFORMA | 0.838 |
363363
| #11 | 4Theta | 0.874 |
364-
|| **Vectrix DOT-Hybrid** | **0.885** |
364+
|| **Vectrix DOT-Hybrid** | **0.877** |
365365
| #18 | Theta | 0.897 |
366366

367367
sMAPE/MASE 상세 결과: [벤치마크 상세](https://eddmpython.github.io/vectrix/docs/benchmarks/)
@@ -520,7 +520,7 @@ Level 2의 모든 파라미터에는 Level 1 동작을 재현하는 합리적인
520520

521521
| 우선순위 | 영역 | 현재 | 목표 | 상태 |
522522
|:---------|:-----|:-----|:-----|:-----|
523-
| **P0** | M4 정확도 | OWA 0.885 | OWA < 0.850 | 진행 중 |
523+
| **P0** | M4 정확도 | OWA 0.877 | OWA < 0.850 | 진행 중 |
524524
| **P1** | Easy API Progressive Disclosure | Level 1만 | Level 1-3 | 진행 중 |
525525
| **P2** | 파이프라인 속도 | 48ms forecast() | < 10ms | 계획 |
526526
| **P3** | Foundation Model 깊이 | 기본 래퍼 | 완전 통합 | 계획 |

docs/benchmarks.ko.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@ Vectrix는 표준 시계열 예측 대회(M3, M4)에서 **OWA**(Overall Weighted
1313
| 빈도 | DOT-Hybrid OWA | M4 대비 |
1414
|------|:--------------:|---------|
1515
| Yearly | **0.797** | M4 1위 ES-RNN(0.821)에 근접 |
16-
| Quarterly | **0.905** | M4 상위권 수준 |
17-
| Monthly | **0.933** | 안정적 중상위 |
16+
| Quarterly | **0.894** | M4 상위권 수준 |
17+
| Monthly | **0.897** | M4 상위권 수준 |
1818
| Weekly | **0.959** | Naive2 초과 |
1919
| Daily | **0.996** | Naive2와 동등 |
2020
| Hourly | **0.722** | 세계 최정상급, M4 우승자 수준 |
21-
| **평균** | **0.885** | **M4 #18 Theta(0.897) 초과** |
21+
| **평균** | **0.877** | **M4 #18 Theta(0.897) 초과** |
2222

2323
### M4 공식 순위 비교
2424

@@ -29,7 +29,7 @@ Vectrix는 표준 시계열 예측 대회(M3, M4)에서 **OWA**(Overall Weighted
2929
| 3 | Theta (Fiorucci) | 0.854 |
3030
| 11 | 4Theta (Petropoulos) | 0.874 |
3131
| 18 | Theta (Assimakopoulos) | 0.897 |
32-
| -- | **Vectrix DOT-Hybrid** | **0.885** |
32+
| -- | **Vectrix DOT-Hybrid** | **0.877** |
3333

3434
Vectrix DOT-Hybrid는 M4 Competition의 **모든 순수 통계 방법**을 능가합니다. 더 높은 순위의 방법들은 모두 하이브리드(ES-RNN = LSTM + ETS, FFORMA = 메타러닝 앙상블)입니다.
3535

docs/benchmarks.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@ The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S01692070
1313
| Frequency | DOT-Hybrid OWA | M4 Context |
1414
|-----------|:--------------:|------------|
1515
| Yearly | **0.797** | Near M4 #1 ES-RNN (0.821) |
16-
| Quarterly | **0.905** | Competitive with M4 top methods |
17-
| Monthly | **0.933** | Solid mid-table performance |
16+
| Quarterly | **0.894** | Competitive with M4 top methods |
17+
| Monthly | **0.897** | Competitive with M4 top methods |
1818
| Weekly | **0.959** | Beats Naive2 |
1919
| Daily | **0.996** | Near parity with Naive2 |
2020
| Hourly | **0.722** | World-class, near M4 winner level |
21-
| **AVG** | **0.885** | **Beats M4 #18 Theta (0.897)** |
21+
| **AVG** | **0.877** | **Beats M4 #18 Theta (0.897)** |
2222

2323
### M4 Competition Leaderboard Context
2424

@@ -29,7 +29,7 @@ The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S01692070
2929
| 3 | Theta (Fiorucci) | 0.854 |
3030
| 11 | 4Theta (Petropoulos) | 0.874 |
3131
| 18 | Theta (Assimakopoulos) | 0.897 |
32-
| -- | **Vectrix DOT-Hybrid** | **0.885** |
32+
| -- | **Vectrix DOT-Hybrid** | **0.877** |
3333

3434
Vectrix DOT-Hybrid outperforms **all pure statistical methods** in the M4 Competition. Only hybrid methods (ES-RNN = LSTM + ETS, FFORMA = meta-learning ensemble) rank higher.
3535

docs/blog/002_howWeKnowForecastsWork.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ It's easy to build a model, test it on your own data, and convince yourself it w
129129

130130
### 2. They guide tool selection
131131

132-
When choosing a forecasting library, you want evidence. "Our library uses advanced algorithms" is marketing. "Our library achieves OWA 0.885 on the M4 Competition dataset" is a measurable claim you can verify.
132+
When choosing a forecasting library, you want evidence. "Our library uses advanced algorithms" is marketing. "Our library achieves OWA 0.877 on the M4 Competition dataset" is a measurable claim you can verify.
133133

134134
### 3. They reveal method strengths and weaknesses
135135

@@ -308,12 +308,12 @@ Transparency matters. Here's how Vectrix performs on the M4 benchmark, using 2,0
308308
| Frequency | Vectrix OWA | Context |
309309
|-----------|:-----------:|---------|
310310
| Yearly | **0.797** | Near M4 winner level |
311-
| Quarterly | **0.905** | Competitive with top methods |
312-
| Monthly | **0.933** | Solid mid-table |
311+
| Quarterly | **0.894** | Competitive with top methods |
312+
| Monthly | **0.897** | Competitive with top methods |
313313
| Weekly | **0.959** | Beats Naive2 |
314314
| Daily | **0.996** | Near parity with Naive2 |
315315
| Hourly | **0.722** | World-class |
316-
| **Average** | **0.885** | **Outperforms M4 #18 Theta (0.897)** |
316+
| **Average** | **0.877** | **Outperforms M4 #18 Theta (0.897)** |
317317

318318
These numbers aren't cherry-picked or inflated. They represent honest performance — strong in some frequencies, room for improvement in others. We publish our benchmark code so you can [reproduce every number](https://eddmpython.github.io/vectrix/docs/benchmarks/).
319319

docs/blog/assets/benchmark-hero.svg

Lines changed: 1 addition & 1 deletion
Loading

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "vectrix"
3-
version = "0.0.11"
3+
version = "0.0.12"
44
description = "Zero-config time series forecasting & analysis library. 30+ models with built-in Rust engine for blazing-fast performance."
55
readme = "README.md"
66
license = {file = "LICENSE"}

src/vectrix/engine/dot.py

Lines changed: 67 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,9 @@
99
(2 trend types x 2 model types x 2 season types) for improved
1010
accuracy on low-frequency data. For period>=24, uses original
1111
3-parameter optimization which excels on high-frequency data.
12+
For period>1, uses holdout validation for config selection (E043).
1213
13-
M4 Competition benchmark: OWA 0.885 (DOT-Hybrid) vs 0.905 (original).
14+
M4 Competition benchmark: OWA 0.877 (DOT-Hybrid) vs 0.905 (original).
1415
"""
1516

1617
from typing import Tuple
@@ -274,16 +275,27 @@ def _fitHybrid(self, y: np.ndarray) -> 'DynamicOptimizedTheta':
274275
else:
275276
base = 1.0
276277

278+
useHoldout = self.period > 1 and n >= self.period * 4
279+
if useHoldout:
280+
holdoutSize = max(1, min(n // 5, self.period * 2))
281+
holdoutSize = min(holdoutSize, n // 3)
282+
trainPart = scaled[:n - holdoutSize]
283+
valPart = scaled[n - holdoutSize:]
284+
nTrain = len(trainPart)
285+
else:
286+
trainPart = scaled
287+
277288
bestMae = np.inf
278289
bestConfig = None
279-
bestModel = None
290+
291+
fitData = trainPart if useHoldout else scaled
280292

281293
for seasonType in seasonTypes:
282294
if seasonType != 'none':
283-
seasonal, deseasonalized = self._deseasonalizeAdvanced(scaled, self.period, seasonType)
295+
seasonal, deseasonalized = self._deseasonalizeAdvanced(fitData, self.period, seasonType)
284296
else:
285297
seasonal = None
286-
deseasonalized = scaled
298+
deseasonalized = fitData
287299

288300
for trendType in ['linear', 'exponential']:
289301
thetaLine0 = self._fitTrendLine(deseasonalized, trendType)
@@ -300,20 +312,44 @@ def _fitHybrid(self, y: np.ndarray) -> 'DynamicOptimizedTheta':
300312
if result is None:
301313
continue
302314

303-
fittedVals = result['fittedValues']
304-
if seasonal is not None:
305-
fittedVals = self._reseasonalize(fittedVals, seasonal, seasonType)
315+
if useHoldout:
316+
valPred = self._predictVariantSteps(result, trendType, modelType, holdoutSize)
317+
if seasonal is not None:
318+
for h in range(holdoutSize):
319+
idx = (nTrain + h) % self.period
320+
if seasonType == 'multiplicative':
321+
valPred[h] *= seasonal[idx]
322+
else:
323+
valPred[h] += seasonal[idx]
324+
mae = np.mean(np.abs(valPart - valPred))
325+
else:
326+
fittedVals = result['fittedValues']
327+
if seasonal is not None:
328+
fittedVals = self._reseasonalize(fittedVals, seasonal, seasonType)
329+
mae = np.mean(np.abs(fitData - fittedVals))
306330

307-
mae = np.mean(np.abs(scaled - fittedVals))
308331
if mae < bestMae:
309332
bestMae = mae
310333
bestConfig = (trendType, modelType, seasonType)
311-
bestModel = result
312-
bestModel['seasonal'] = seasonal
313-
bestModel['base'] = base
314334

335+
if bestConfig is None:
336+
return self._fitClassic(y)
337+
338+
trendType, modelType, seasonType = bestConfig
339+
if seasonType != 'none':
340+
seasonal, deseasonalized = self._deseasonalizeAdvanced(scaled, self.period, seasonType)
341+
else:
342+
seasonal = None
343+
deseasonalized = scaled
344+
345+
thetaLine0 = self._fitTrendLine(deseasonalized, trendType)
346+
if thetaLine0 is None:
347+
return self._fitClassic(y)
348+
bestModel = self._fitVariant(deseasonalized, thetaLine0, trendType, modelType)
315349
if bestModel is None:
316350
return self._fitClassic(y)
351+
bestModel['seasonal'] = seasonal
352+
bestModel['base'] = base
317353

318354
self._hybridMode = True
319355
self._hybridConfig = bestConfig
@@ -322,10 +358,29 @@ def _fitHybrid(self, y: np.ndarray) -> 'DynamicOptimizedTheta':
322358
self.intercept = bestModel['intercept']
323359
self.slope = bestModel['slope']
324360
self.lastLevel = bestModel['lastLevel']
325-
self.residuals = y - bestModel['fittedValues'] * base
361+
362+
fittedVals = bestModel['fittedValues']
363+
if seasonal is not None:
364+
fittedVals = self._reseasonalize(fittedVals, seasonal, seasonType)
365+
self.residuals = y - fittedVals * base
326366
self.fitted = True
327367
return self
328368

369+
def _predictVariantSteps(self, model, trendType, modelType, steps):
370+
n = model['n']
371+
futureX = np.arange(n, n + steps, dtype=np.float64)
372+
if trendType == 'exponential':
373+
forecastTrend = np.exp(model['intercept'] + model['slope'] * futureX)
374+
else:
375+
forecastTrend = model['intercept'] + model['slope'] * futureX
376+
forecastSES = np.full(steps, model['lastLevel'])
377+
if modelType == 'additive':
378+
w = 1.0 / max(model['theta'], 1.0)
379+
return w * forecastSES + (1.0 - w) * forecastTrend
380+
invTheta = 1.0 / max(model['theta'], 1.0)
381+
return np.power(np.maximum(forecastSES, 1e-10), invTheta) * \
382+
np.power(np.maximum(forecastTrend, 1e-10), 1.0 - invTheta)
383+
329384
def predict(self, steps: int) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
330385
if not self.fitted:
331386
raise ValueError("Model not fitted.")

0 commit comments

Comments
 (0)