1-
2-
31<!-- TITLE -->
42# Positional Encoding Benchmark for Time Series Classification
53
4+ [ ![ arXiv] ( https://img.shields.io/badge/arXiv-2502.12370-b31b1b.svg )] ( https://arxiv.org/abs/2502.12370 )
5+ [ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( https://opensource.org/licenses/MIT )
6+ [ ![ Python 3.10] ( https://img.shields.io/badge/python-3.10-blue.svg )] ( https://www.python.org/downloads/release/python-3100/ )
7+ [ ![ PyTorch] ( https://img.shields.io/badge/PyTorch-2.4.1-ee4c2c.svg )] ( https://pytorch.org/ )
8+
69This repository provides a comprehensive evaluation framework for positional encoding methods in transformer-based time series models, along with implementations and benchmarking results.
710
811Our work is available on arXiv: [ Positional Encoding in Transformer-Based Time Series Models: A Survey] ( https://arxiv.org/abs/2502.12370 )
@@ -14,43 +17,28 @@ We present a systematic analysis of positional encoding methods evaluated on two
14172 . Time Series Transformer with Patch Embedding
1518
1619
17-
1820### Positional Encoding Methods
1921We implement and evaluate eight positional encoding methods:
2022
21- | Method | Type | Injection Technique | Parameters |
22- | --------| ------| -------------------| ------------|
23- | Sinusoidal PE | Absolute | Additive | 0 |
24- | Learnable PE | Absolute | Additive | L×d |
25- | RPE | Relative | MAM | 2(2L-1)dl |
26- | tAPE | Absolute | Additive | Ld |
27- | eRPE | Relative | MAM | (L²+L)l |
28- | TUPE | Rel+Abs | MAM | 2dl |
29- | ConvSPE | Relative | MAM | 3Kdh+dl |
30- | T-PE | Rel+Abs | Combined | 2d²l/h+(2L+2l)d |
31-
32- Where:
33- - L: sequence length
34- - d: embedding dimension
35- - h: number of attention heads
36- - K: kernel size
37- - l: number of layers
38-
39- ## Dataset Characteristics
40-
41- | Dataset | Train Size | Test Size | Length | Classes | Channels | Type |
42- | ---------| ------------| -----------| ---------| ----------| -----------| ------|
43- | Sleep | 478,785 | 90,315 | 178 | 5 | 1 | EEG |
44- | ElectricDevices | 8,926 | 7,711 | 96 | 7 | 1 | Device |
45- | FaceDetection | 5,890 | 3,524 | 62 | 2 | 144 | EEG |
46- | MelbournePedestrian | 1,194 | 2,439 | 24 | 10 | 1 | Traffic |
47- | SharePriceIncrease | 965 | 965 | 60 | 2 | 1 | Financial |
48- | LSST | 2,459 | 2,466 | 36 | 14 | 6 | Other |
49- | RacketSports | 151 | 152 | 30 | 4 | 6 | HAR |
50- | SelfRegulationSCP1 | 268 | 293 | 896 | 2 | 6 | EEG |
51- | UniMiB-SHAR | 4,601 | 1,524 | 151 | 9 | 3 | HAR |
52- | RoomOccupancy | 8,103 | 2,026 | 30 | 4 | 18 | Sensor |
53- | EMGGestures | 1,800 | 450 | 30 | 8 | 9 | EMG |
23+ | Method | Type | Inject. | Learn. | Params | Memory | Complex. |
24+ | --------| ------| ---------| --------| ---------| ---------| ----------|
25+ | Sin. PE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
26+ | Learn. PE | Abs | Add | L | Ld | O(Ld) | O(Ld) |
27+ | RPE | Rel | Att | F | (2L−1)dl | O(L²d) | O(L²d) |
28+ | tAPE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
29+ | RoPE | Hyb | Att | F | 0 | O(Ld) | O(L²d) |
30+ | eRPE | Rel | Att | L | 2L − 1 | O(L² + L) | O(L²) |
31+ | TUPE | Hyb | Att | L | 2dl | O(Ld+d²) | O(Ld+d²) |
32+ | ConvSPE | Rel | Att | L | 3Kdh+dl | O(LKR) | O(LKR) |
33+ | T-PE | Hyb | Comb | M | 2d²l/h+(2L+2l)d | O(L²d) | O(L²d) |
34+ | ALiBi | Rel | Att | F | 0 | O(L²h) | O(L²h) |
35+
36+ ** Legend:**
37+ - Abs=Absolute, Rel=Relative, Hyb=Hybrid
38+ - Add=Additive, Att=Attention, Comb=Combined
39+ - F=Fixed, L=Learnable, M=Mixed
40+ - L: sequence length, d: embedding dimension, h: attention heads, K: kernel size, l: layers
41+
5442
5543## Dependencies
5644- Python 3.10
@@ -88,16 +76,16 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
8876
8977### Key Findings
9078
91- #### 1. Sequence Length Impact
79+ #### 📊 Sequence Length Impact
9280- ** Long sequences** (>100 steps): 5-6% improvement with advanced methods
9381- ** Medium sequences** (50-100 steps): 3-4% improvement
9482- ** Short sequences** (<50 steps): 2-3% improvement
9583
96- #### 2. Architecture Performance
84+ #### ⚙️ Architecture Performance
9785- ** TST** : More distinct performance gaps
9886- ** Patch Embedding** : More balanced performance among top methods
9987
100- #### 3. Average Rankings
88+ #### 🏆 Average Rankings
10189- ** SPE** : 1.727 (batch norm), 2.090 (patch embed)
10290- ** TUPE** : 1.909 (batch norm), 2.272 (patch embed)
10391- ** T-PE** : 2.636 (batch norm), 2.363 (patch embed)
@@ -114,6 +102,48 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
114102- TUPE maintains competitive accuracy
115103- Relative encoding methods show improved local pattern recognition
116104
105+
106+ ### Computational Efficiency Analysis
107+
108+ Training time measurements on Melbourne Pedestrian dataset (100 epochs):
109+
110+ | Method | Time (s) | Ratio | Accuracy |
111+ | --------| ----------| -------| ----------|
112+ | Sin. PE | 48.2 | 1.00 | 66.8% |
113+ | Learn. PE | 60.1 | 1.25 | 70.2% |
114+ | RPE | 128.4 | 2.66 | 72.4% |
115+ | tAPE | 54.0 | 1.12 | 68.2% |
116+ | RoPE | 67.8 | 1.41 | 69.0% |
117+ | eRPE | 142.8 | 2.96 | 73.3% |
118+ | TUPE | 118.3 | 2.45 | 74.5% |
119+ | ConvSPE | 101.6 | 2.11 | ** 75.3%** |
120+ | T-PE | 134.7 | 2.79 | 74.2% |
121+ | ALiBi | 93.8 | 1.94 | 67.2% |
122+
123+ ** ConvSPE emerges as the efficiency frontier leader** , achieving highest accuracy (75.3%) with reasonable computational overhead (2.11×).
124+
125+ ### Method Selection Guidelines
126+
127+ #### Sequence Length-Based Recommendations
128+ - ** Short sequences (L ≤ 50)** : Learnable PE or tAPE (minimal gains don't justify computational overhead)
129+ - ** Medium sequences (50 < L ≤ 100)** : SPE or eRPE (3-4% accuracy improvements)
130+ - ** Long sequences (L > 100)** : TUPE for complex patterns, SPE for regular data, ConvSPE for linear complexity
131+
132+ #### Domain-Specific Guidelines
133+ - ** Biomedical signals** : TUPE > SPE > T-PE (physiological complexity handling)
134+ - ** Environmental sensors** : SPE > eRPE (regular sampling patterns)
135+ - ** High-dimensional data (d > 5)** : Advanced methods consistently outperform simple approaches
136+
137+ #### Computational Resource Framework
138+ - ** Limited resources** : Sinusoidal PE, tAPE (O(Ld) complexity)
139+ - ** Balanced scenarios** : SPE, TUPE (optimal accuracy-efficiency trade-off)
140+ - ** Performance-critical** : TUPE, SPE regardless of computational cost
141+
142+ #### Architecture-Specific Considerations
143+ - ** Time Series Transformers** : Prioritize content-position separation methods (TUPE) and relative positioning (eRPE, SPE)
144+ - ** Patch Embedding Transformers** : Multi-scale approaches (T-PE, ConvSPE) handle hierarchical processing more effectively
145+
146+
117147<!-- CONTRIBUTING -->
118148## Contributing
119149Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
0 commit comments