You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/benchmarks.md
+31-49Lines changed: 31 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,33 @@ Vectrix is evaluated against standard time series forecasting competitions (M3,
6
6
-**OWA = 1.0** → same as Naive2
7
7
-**OWA > 1.0** → worse than Naive2
8
8
9
+
## M4 Competition Results — DOT-Hybrid Engine
10
+
11
+
The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. Results below are from **DOT-Hybrid** (DynamicOptimizedTheta with 8-way auto-select), evaluated on 2,000 randomly sampled series per frequency (seed=42):
12
+
13
+
| Frequency | DOT-Hybrid OWA | M4 Context |
14
+
|------------|:--------------:|------------|
15
+
| Yearly |**0.797**| Near M4 #1 ES-RNN (0.821) |
16
+
| Quarterly |**0.905**| Competitive with M4 top methods |
Vectrix DOT-Hybrid outperforms **all pure statistical methods** in the M4 Competition. Only hybrid methods (ES-RNN = LSTM + ETS, FFORMA = meta-learning ensemble) rank higher.
35
+
9
36
## M3 Competition Results
10
37
11
38
The [M3 Competition](https://forecasters.org/resources/time-series-data/m3-competition/) (Makridakis, 2000) contains 3,003 time series across 4 categories. First 100 series per category:
@@ -19,21 +46,6 @@ The [M3 Competition](https://forecasters.org/resources/time-series-data/m3-compe
19
46
20
47
Vectrix outperforms Naive2 on **4 out of 4** M3 categories, with M3 Monthly achieving OWA = 0.758.
21
48
22
-
## M4 Competition Results
23
-
24
-
The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. First 100 series per frequency:
25
-
26
-
| Frequency | Naive2 sMAPE | Vectrix sMAPE | Naive2 MASE | Vectrix MASE |**Vectrix OWA**|
Results are saved to `benchmarks/m3Results.csv` and `benchmarks/m4Results.csv`.
75
+
M4 benchmark experiments are located in `src/vectrix/experiments/modelCreation/019_dotHybridEngine.py`.
93
76
94
77
### Notes
95
78
96
79
- All models are **deterministic** (no random seed required). Given the same data and parameters, Vectrix produces identical results across runs.
97
-
- The `--n 100` flag selects the first 100 series per category/frequency. Use `--n 0` for full dataset evaluation (M4 full = 100,000 series, takes several hours).
98
-
- Benchmark scripts automatically download competition data from the `datasetsforecast` package.
99
80
- The built-in Rust engine does not affect accuracy — only speed. Results are numerically identical with or without Rust acceleration.
81
+
- M4 data files can be downloaded from the [M4 Competition repository](https://github.com/Mcompetitions/M4-methods).
Copy file name to clipboardExpand all lines: landing/src/content/en/benchmarks.md
+34-95Lines changed: 34 additions & 95 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,32 @@ title: Benchmarks
6
6
7
7
Vectrix is benchmarked against the M3 and M4 Competition datasets, the gold standard for time series forecasting evaluation. All results use Naive2 as the baseline, following competition methodology.
8
8
9
-
## Metrics
9
+
## M4 Competition Results — DOT-Hybrid Engine
10
10
11
-
| Metric | Description |
12
-
|--------|-------------|
13
-
|**sMAPE**| Symmetric Mean Absolute Percentage Error. Scale-independent accuracy measure, bounded between 0% and 200%. |
14
-
|**MASE**| Mean Absolute Scaled Error. Compares forecast errors against a naive seasonal benchmark. Values below 1.0 indicate the model outperforms the naive method. |
15
-
|**OWA**| Overall Weighted Average. Combines sMAPE and MASE relative to Naive2: `OWA = 0.5 × (sMAPE/sMAPE_naive2) + 0.5 × (MASE/MASE_naive2)`. **OWA below 1.0 means the model beats Naive2.**|
11
+
The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. Results from **DOT-Hybrid** (DynamicOptimizedTheta with 8-way auto-select), evaluated on 2,000 randomly sampled series per frequency (seed=42):
12
+
13
+
| Frequency | DOT-Hybrid OWA | M4 Context |
14
+
|-----------|:--------------:|------------|
15
+
| Yearly |**0.797**| Near M4 #1 ES-RNN (0.821) |
16
+
| Quarterly |**0.905**| Competitive with M4 top methods |
Vectrix beats Naive2 on 4 of 6 M4 frequencies. Weekly data shows the largest improvement (OWA 0.737). Daily and Hourly remain active areas of improvement (see below).
44
-
45
-
## Understanding the Results
46
-
47
-
**Strong performance (OWA well below 1.0):**
48
-
- M3 Monthly (0.758) and M3 Quarterly (0.825) demonstrate robust model selection on mid-frequency data.
49
-
- M4 Weekly (0.737) benefits from DTSF and MSTL multi-seasonal pattern capture.
50
-
51
-
**Competitive performance (OWA near 1.0):**
52
-
- M4 Yearly (0.974) and Monthly (0.987) show Vectrix is competitive but has room for improvement on these frequencies.
53
-
54
-
**Known weaknesses:**
55
-
- M4 Daily (OWA 1.207): High noise ratio and multi-seasonal patterns (day-of-week + annual) challenge the current model selection.
These weaknesses are documented transparently and are active research areas. See the model creation experiments in the repository for ongoing work.
59
-
60
-
## Running Benchmarks
49
+
## Metrics
61
50
62
-
### Reproducing with Vectrix 0.0.7
51
+
| Metric | Description |
52
+
|--------|-------------|
53
+
|**sMAPE**| Symmetric Mean Absolute Percentage Error. Scale-independent accuracy measure, bounded between 0% and 200%. |
54
+
|**MASE**| Mean Absolute Scaled Error. Compares forecast errors against a naive seasonal benchmark. Values below 1.0 indicate the model outperforms the naive method. |
55
+
|**OWA**| Overall Weighted Average. Combines sMAPE and MASE relative to Naive2: `OWA = 0.5 × (sMAPE/sMAPE_naive2) + 0.5 × (MASE/MASE_naive2)`. **OWA below 1.0 means the model beats Naive2.**|
63
56
64
-
Install Vectrix
57
+
## Reproducing Results
65
58
66
59
```bash
67
-
pip install vectrix==0.0.7
68
-
```
69
-
70
-
Run the M3 benchmark (first 100 series per category)
> **Tip:** For faster M4 data loading, download the CSV files directly from the [M4 Competition repository](https://github.com/Mcompetitions/M4-methods) rather than using `M4.load()`, which can be slow due to wide-to-long data transformation.
0 commit comments