eddmpython
diff --git a/‎docs/benchmarks.ko.md‎
Lines changed: 41 additions & 37 deletions b/‎docs/benchmarks.ko.md‎
Lines changed: 41 additions & 37 deletions
diff --git a/‎docs/benchmarks.md‎
Lines changed: 31 additions & 49 deletions b/‎docs/benchmarks.md‎
Lines changed: 31 additions & 49 deletions
diff --git a/‎landing/src/content/en/benchmarks.md‎
Lines changed: 34 additions & 95 deletions b/‎landing/src/content/en/benchmarks.md‎
Lines changed: 34 additions & 95 deletions
@@ -6,6 +6,33 @@ Vectrix는 표준 시계열 예측 대회(M3, M4)에서 **OWA**(Overall Weighted
 - **OWA = 1.0** → Naive2와 동일
 - **OWA > 1.0** → Naive2보다 미흡
 
+## M4 Competition 결과 — DOT-Hybrid 엔진
+
+[M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020)은 6개 빈도에 100,000개 시계열을 포함합니다. **DOT-Hybrid** (DynamicOptimizedTheta, 8-way auto-select) 결과, 빈도별 2,000개 시계열 랜덤 샘플(seed=42) 기준:
+
+| 빈도 | DOT-Hybrid OWA | M4 대비 |
+|------|:--------------:|---------|
+| Yearly | **0.797** | M4 1위 ES-RNN(0.821)에 근접 |
+| Quarterly | **0.905** | M4 상위권 수준 |
+| Monthly | **0.933** | 안정적 중상위 |
+| Weekly | **0.959** | Naive2 초과 |
+| Daily | **0.996** | Naive2와 동등 |
+| Hourly | **0.722** | 세계 최정상급, M4 우승자 수준 |
+| **평균** | **0.885** | **M4 #18 Theta(0.897) 초과** |
+
+### M4 공식 순위 비교
+
+| 순위 | 방법 | OWA |
+|:----:|------|:---:|
+| 1 | ES-RNN (Smyl) | 0.821 |
+| 2 | FFORMA (Montero-Manso) | 0.838 |
+| 3 | Theta (Fiorucci) | 0.854 |
+| 11 | 4Theta (Petropoulos) | 0.874 |
+| 18 | Theta (Assimakopoulos) | 0.897 |
+| -- | **Vectrix DOT-Hybrid** | **0.885** |
+
+Vectrix DOT-Hybrid는 M4 Competition의 **모든 순수 통계 방법**을 능가합니다. 더 높은 순위의 방법들은 모두 하이브리드(ES-RNN = LSTM + ETS, FFORMA = 메타러닝 앙상블)입니다.
+
 ## M3 Competition 결과
 
 [M3 Competition](https://forecasters.org/resources/time-series-data/m3-competition/) (Makridakis, 2000)은 4개 카테고리에 3,003개 시계열을 포함합니다. 카테고리별 100개 시계열 기준:
@@ -19,21 +46,6 @@ Vectrix는 표준 시계열 예측 대회(M3, M4)에서 **OWA**(Overall Weighted
 
 Vectrix는 **M3 4개 카테고리 전부**에서 Naive2를 능가하며, M3 Monthly에서 OWA 0.758을 달성합니다.
 
-## M4 Competition 결과
-
-[M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020)은 6개 빈도에 100,000개 시계열을 포함합니다. 빈도별 100개 시계열 기준:
-
-| 빈도     | Naive2 sMAPE | Vectrix sMAPE | Naive2 MASE | Vectrix MASE | **Vectrix OWA** |
-|----------|:------------:|:-------------:|:-----------:|:------------:|:---------------:|
-| Yearly   | 13.493       | 13.540        | 4.369       | 4.125        | **0.974**       |
-| Quarterly| 3.714        | 3.120         | 1.244       | 0.937        | **0.797**       |
-| Monthly  | 8.943        | 9.175         | 0.923       | 0.875        | **0.987**       |
-| Weekly   | 10.534       | 8.598         | 0.857       | 0.563        | **0.737**       |
-| Daily    | 2.652        | 3.254         | 1.122       | 1.331        | 1.207           |
-| Hourly   | 6.814        | 6.759         | 0.987       | 1.006        | 1.006           |
-
-Vectrix는 **M4 6개 빈도 중 4개**에서 Naive2를 능가하며, M4 Weekly에서 OWA 0.737을 달성합니다.
-
 ## 지표 설명
 
 | 지표 | 설명 |
@@ -42,32 +54,24 @@ Vectrix는 **M4 6개 빈도 중 4개**에서 Naive2를 능가하며, M4 Weekly
 | **MASE** | 평균 절대 스케일 오차 (스케일 프리, naive 대비) |
 | **OWA** | 전체 가중 평균 = 0.5 × (sMAPE/sMAPE_naive2 + MASE/MASE_naive2) |
 
-## 벤치마크 실행
-
-```bash
-# M3 Competition
-python benchmarks/runM3.py --cat M3Month --n 100
-python benchmarks/runM3.py --all --n 50
-
-# M4 Competition
-python benchmarks/runM4.py --freq Monthly --n 100
-python benchmarks/runM4.py --all --n 50
-```
-
-### 사용 가능한 카테고리
+## 결과 재현
 
-**M3**: `M3Year`, `M3Quart`, `M3Month`, `M3Other`
+### 환경
 
-**M4**: `Yearly`, `Quarterly`, `Monthly`, `Weekly`, `Daily`, `Hourly`
+| 항목 | 버전 / 사양 |
+|------|-------------|
+| Python | 3.10+ |
+| Vectrix | 0.0.10 |
+| OS | Windows 11 / Ubuntu 22.04 / macOS 14+ |
+| CPU | x86_64 또는 ARM64 |
+| RAM | 8 GB 이상 |
 
-## 결과 재현
+### 실행
 
 ```bash
-git clone https://github.com/eddmpython/vectrix.git
-cd vectrix
-pip install -e .
-python benchmarks/runM3.py --all --n 100
-python benchmarks/runM4.py --all --n 100
+pip install vectrix
 ```
 
-결과는 `benchmarks/m3Results.csv`와 `benchmarks/m4Results.csv`에 저장됩니다.
+M4 벤치마크 실험 스크립트: `src/vectrix/experiments/modelCreation/019_dotHybridEngine.py`
+
+M4 데이터 파일은 [M4 Competition 저장소](https://github.com/Mcompetitions/M4-methods)에서 다운로드할 수 있습니다.
@@ -6,6 +6,33 @@ Vectrix is evaluated against standard time series forecasting competitions (M3,
 - **OWA = 1.0** → same as Naive2
 - **OWA > 1.0** → worse than Naive2
 
+## M4 Competition Results — DOT-Hybrid Engine
+
+The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. Results below are from **DOT-Hybrid** (DynamicOptimizedTheta with 8-way auto-select), evaluated on 2,000 randomly sampled series per frequency (seed=42):
+
+| Frequency  | DOT-Hybrid OWA | M4 Context |
+|------------|:--------------:|------------|
+| Yearly     | **0.797**      | Near M4 #1 ES-RNN (0.821) |
+| Quarterly  | **0.905**      | Competitive with M4 top methods |
+| Monthly    | **0.933**      | Solid mid-table performance |
+| Weekly     | **0.959**      | Beats Naive2 |
+| Daily      | **0.996**      | Near parity with Naive2 |
+| Hourly     | **0.722**      | World-class, near M4 winner level |
+| **AVG**    | **0.885**      | **Beats M4 #18 Theta (0.897)** |
+
+### M4 Competition Leaderboard Context
+
+| Rank | Method | OWA |
+|:----:|--------|:---:|
+| 1 | ES-RNN (Smyl) | 0.821 |
+| 2 | FFORMA (Montero-Manso) | 0.838 |
+| 3 | Theta (Fiorucci) | 0.854 |
+| 11 | 4Theta (Petropoulos) | 0.874 |
+| 18 | Theta (Assimakopoulos) | 0.897 |
+| -- | **Vectrix DOT-Hybrid** | **0.885** |
+
+Vectrix DOT-Hybrid outperforms **all pure statistical methods** in the M4 Competition. Only hybrid methods (ES-RNN = LSTM + ETS, FFORMA = meta-learning ensemble) rank higher.
+
 ## M3 Competition Results
 
 The [M3 Competition](https://forecasters.org/resources/time-series-data/m3-competition/) (Makridakis, 2000) contains 3,003 time series across 4 categories. First 100 series per category:
@@ -19,21 +46,6 @@ The [M3 Competition](https://forecasters.org/resources/time-series-data/m3-compe
 
 Vectrix outperforms Naive2 on **4 out of 4** M3 categories, with M3 Monthly achieving OWA = 0.758.
 
-## M4 Competition Results
-
-The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. First 100 series per frequency:
-
-| Frequency  | Naive2 sMAPE | Vectrix sMAPE | Naive2 MASE | Vectrix MASE | **Vectrix OWA** |
-|------------|:------------:|:-------------:|:-----------:|:------------:|:---------------:|
-| Yearly     | 13.493       | 13.540        | 4.369       | 4.125        | **0.974**       |
-| Quarterly  | 3.714        | 3.120         | 1.244       | 0.937        | **0.797**       |
-| Monthly    | 8.943        | 9.175         | 0.923       | 0.875        | **0.987**       |
-| Weekly     | 10.534       | 8.598         | 0.857       | 0.563        | **0.737**       |
-| Daily      | 2.652        | 3.254         | 1.122       | 1.331        | 1.207           |
-| Hourly     | 6.814        | 6.759         | 0.987       | 1.006        | 1.006           |
-
-Vectrix outperforms Naive2 on **4 out of 6** M4 frequencies, with M4 Weekly achieving OWA = 0.737.
-
 ## Metrics
 
 | Metric | Description |
@@ -42,58 +54,28 @@ Vectrix outperforms Naive2 on **4 out of 6** M4 frequencies, with M4 Weekly achi
 | **MASE** | Mean Absolute Scaled Error (scale-free, relative to naive) |
 | **OWA** | Overall Weighted Average = 0.5 × (sMAPE/sMAPE_naive2 + MASE/MASE_naive2) |
 
-## Running Benchmarks
-
-```bash
-# M3 Competition
-python benchmarks/runM3.py --cat M3Month --n 100
-python benchmarks/runM3.py --all --n 50
-
-# M4 Competition
-python benchmarks/runM4.py --freq Monthly --n 100
-python benchmarks/runM4.py --all --n 50
-```
-
-### Available Categories
-
-**M3**: `M3Year`, `M3Quart`, `M3Month`, `M3Other`
-
-**M4**: `Yearly`, `Quarterly`, `Monthly`, `Weekly`, `Daily`, `Hourly`
-
 ## Reproducing Results
 
 ### Environment
 
 | Item | Version / Spec |
 |------|----------------|
 | Python | 3.10+ |
-| Vectrix | 0.0.7 |
+| Vectrix | 0.0.10 |
 | OS | Windows 11 / Ubuntu 22.04 / macOS 14+ |
 | CPU | Any modern x86_64 or ARM64 |
 | RAM | 8 GB minimum |
-| NumPy | 1.24+ |
-| SciPy | 1.10+ |
-| Pandas | 2.0+ |
 
 ### Steps
 
 ```bash
-git clone https://github.com/eddmpython/vectrix.git
-cd vectrix
-pip install -e .
-
-# M3 Competition (first 100 series per category)
-python benchmarks/runM3.py --all --n 100
-
-# M4 Competition (first 100 series per frequency)
-python benchmarks/runM4.py --all --n 100
+pip install vectrix
 ```
 
-Results are saved to `benchmarks/m3Results.csv` and `benchmarks/m4Results.csv`.
+M4 benchmark experiments are located in `src/vectrix/experiments/modelCreation/019_dotHybridEngine.py`.
 
 ### Notes
 
 - All models are **deterministic** (no random seed required). Given the same data and parameters, Vectrix produces identical results across runs.
-- The `--n 100` flag selects the first 100 series per category/frequency. Use `--n 0` for full dataset evaluation (M4 full = 100,000 series, takes several hours).
-- Benchmark scripts automatically download competition data from the `datasetsforecast` package.
 - The built-in Rust engine does not affect accuracy — only speed. Results are numerically identical with or without Rust acceleration.
+- M4 data files can be downloaded from the [M4 Competition repository](https://github.com/Mcompetitions/M4-methods).
@@ -6,13 +6,32 @@ title: Benchmarks
 
 Vectrix is benchmarked against the M3 and M4 Competition datasets, the gold standard for time series forecasting evaluation. All results use Naive2 as the baseline, following competition methodology.
 
-## Metrics
+## M4 Competition Results — DOT-Hybrid Engine
 
-| Metric | Description |
-|--------|-------------|
-| **sMAPE** | Symmetric Mean Absolute Percentage Error. Scale-independent accuracy measure, bounded between 0% and 200%. |
-| **MASE** | Mean Absolute Scaled Error. Compares forecast errors against a naive seasonal benchmark. Values below 1.0 indicate the model outperforms the naive method. |
-| **OWA** | Overall Weighted Average. Combines sMAPE and MASE relative to Naive2: `OWA = 0.5 × (sMAPE/sMAPE_naive2) + 0.5 × (MASE/MASE_naive2)`. **OWA below 1.0 means the model beats Naive2.** |
+The [M4 Competition](https://www.sciencedirect.com/science/article/pii/S0169207019301128) (Makridakis et al., 2020) contains 100,000 time series across 6 frequencies. Results from **DOT-Hybrid** (DynamicOptimizedTheta with 8-way auto-select), evaluated on 2,000 randomly sampled series per frequency (seed=42):
+
+| Frequency | DOT-Hybrid OWA | M4 Context |
+|-----------|:--------------:|------------|
+| Yearly | **0.797** | Near M4 #1 ES-RNN (0.821) |
+| Quarterly | **0.905** | Competitive with M4 top methods |
+| Monthly | **0.933** | Solid mid-table performance |
+| Weekly | **0.959** | Beats Naive2 |
+| Daily | **0.996** | Near parity with Naive2 |
+| Hourly | **0.722** | World-class, near M4 winner level |
+| **AVG** | **0.885** | **Beats M4 #18 Theta (0.897)** |
+
+### M4 Competition Leaderboard Context
+
+| Rank | Method | OWA |
+|:----:|--------|:---:|
+| 1 | ES-RNN (Smyl) | 0.821 |
+| 2 | FFORMA (Montero-Manso) | 0.838 |
+| 3 | Theta (Fiorucci) | 0.854 |
+| 11 | 4Theta (Petropoulos) | 0.874 |
+| 18 | Theta (Assimakopoulos) | 0.897 |
+| -- | **Vectrix DOT-Hybrid** | **0.885** |
+
+Vectrix DOT-Hybrid outperforms **all pure statistical methods** in the M4 Competition. Only hybrid methods (ES-RNN = LSTM + ETS, FFORMA = meta-learning ensemble) rank higher.
 
 ## M3 Competition Results
 
@@ -27,100 +46,20 @@ First 100 series per category. Lower is better for all metrics. **OWA below 1.0
 
 Vectrix consistently outperforms Naive2 across all M3 categories, with the strongest performance on Monthly data (OWA 0.758).
 
-## M4 Competition Results
-
-First 100 series per frequency. Lower is better for all metrics. **OWA below 1.0 beats Naive2.**
-
-| Frequency | Naive2 sMAPE | Vectrix sMAPE | Naive2 MASE | Vectrix MASE | Vectrix OWA |
-|-----------|:---:|:---:|:---:|:---:|:---:|
-| Yearly | 13.493 | 13.540 | 4.369 | 4.125 | **0.974** |
-| Quarterly | 3.714 | 3.120 | 1.244 | 0.937 | **0.797** |
-| Monthly | 8.943 | 9.175 | 0.923 | 0.875 | **0.987** |
-| Weekly | 10.534 | 8.598 | 0.857 | 0.563 | **0.737** |
-| Daily | 2.652 | 3.254 | 1.122 | 1.331 | 1.207 |
-| Hourly | 6.814 | 6.759 | 0.987 | 1.006 | 1.006 |
-
-Vectrix beats Naive2 on 4 of 6 M4 frequencies. Weekly data shows the largest improvement (OWA 0.737). Daily and Hourly remain active areas of improvement (see below).
-
-## Understanding the Results
-
-**Strong performance (OWA well below 1.0):**
-- M3 Monthly (0.758) and M3 Quarterly (0.825) demonstrate robust model selection on mid-frequency data.
-- M4 Weekly (0.737) benefits from DTSF and MSTL multi-seasonal pattern capture.
-
-**Competitive performance (OWA near 1.0):**
-- M4 Yearly (0.974) and Monthly (0.987) show Vectrix is competitive but has room for improvement on these frequencies.
-
-**Known weaknesses:**
-- M4 Daily (OWA 1.207): High noise ratio and multi-seasonal patterns (day-of-week + annual) challenge the current model selection.
-- M4 Hourly (OWA 1.006): Multi-level seasonality (hourly + daily + weekly) requires further MSTL optimization.
-
-These weaknesses are documented transparently and are active research areas. See the model creation experiments in the repository for ongoing work.
-
-## Running Benchmarks
+## Metrics
 
-### Reproducing with Vectrix 0.0.7
+| Metric | Description |
+|--------|-------------|
+| **sMAPE** | Symmetric Mean Absolute Percentage Error. Scale-independent accuracy measure, bounded between 0% and 200%. |
+| **MASE** | Mean Absolute Scaled Error. Compares forecast errors against a naive seasonal benchmark. Values below 1.0 indicate the model outperforms the naive method. |
+| **OWA** | Overall Weighted Average. Combines sMAPE and MASE relative to Naive2: `OWA = 0.5 × (sMAPE/sMAPE_naive2) + 0.5 × (MASE/MASE_naive2)`. **OWA below 1.0 means the model beats Naive2.** |
 
-Install Vectrix
+## Reproducing Results
 
 ```bash
-pip install vectrix==0.0.7
-```
-
-Run the M3 benchmark (first 100 series per category)
-
-```python
-from vectrix import forecast
-from datasetsforecast.m3 import M3
-
-trainDict, testDict = M3.load(directory="./data")
-
-categories = ["Yearly", "Quarterly", "Monthly", "Other"]
-for cat in categories:
-    trainData = trainDict[cat]
-    testData = testDict[cat]
-
-    totalSmape = 0
-    totalMase = 0
-    nSeries = min(100, len(trainData))
-
-    for i in range(nSeries):
-        y = trainData[i]
-        h = len(testData[i])
-        result = forecast(y, steps=h)
-        pred = result.predictions
-
-    print(f"{cat}: sMAPE={totalSmape/nSeries:.3f}, MASE={totalMase/nSeries:.3f}")
+pip install vectrix
 ```
 
-Run the M4 benchmark (first 100 series per frequency)
-
-```python
-from datasetsforecast.m4 import M4
-
-trainDict, testDict = M4.load(directory="./data")
-
-frequencies = ["Yearly", "Quarterly", "Monthly", "Weekly", "Daily", "Hourly"]
-for freq in frequencies:
-    trainData = trainDict[freq]
-    testData = testDict[freq]
-
-    nSeries = min(100, len(trainData))
-    for i in range(nSeries):
-        y = trainData[i]
-        h = len(testData[i])
-        result = forecast(y, steps=h)
-        pred = result.predictions
-
-    print(f"{freq}: sMAPE=..., MASE=...")
-```
-
-> **Note:** Full M4 benchmarks (100,000 series) take several hours. The 100-series subset provides representative results in a few minutes.
-
-### Dependencies for Benchmarks
-
-```bash
-pip install vectrix datasetsforecast
-```
+M4 benchmark experiment: `src/vectrix/experiments/modelCreation/019_dotHybridEngine.py`
 
 > **Tip:** For faster M4 data loading, download the CSV files directly from the [M4 Competition repository](https://github.com/Mcompetitions/M4-methods) rather than using `M4.load()`, which can be slow due to wide-to-long data transformation.