You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: app/projects/mtbench/page.mdx
+18-2Lines changed: 18 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,9 +31,25 @@ As shown in Figure 2, MTBench enables a range of complex reasoning tasks beyond
31
31
32
32

33
33
34
-
The news-driven QA task includes two sub-tasks: correlation prediction and multi-choice QA. As shown in Figure 3, this task requires models to analyze both text and time-series data, understanding the news content while predicting its potential impact on future trends based on historical time-series.
34
+
### Time-Series Forecasting
35
35
36
-

36
+
This task aims to forecast time-series values from historical data, optionally incorporating news articles. We assess short- and long-term forecasting: finance uses 30 days of historical data, while weather forecasting relies on 14 days to predict the next 3, reflecting shorter memory dynamics.
37
+
38
+
### Semantic Trend Analysis
39
+
40
+
For this task we analyze time-series trends by computing the percentage change between input and output data, categorizing results into discrete trend labels (see example in Figure 3). This helps evaluate directional movement and model accuracy.
41
+
42
+

43
+
44
+
### Technical Indicator Prediction
45
+
46
+
This task evaluates the model’s ability to predict financial and weather metrics by forecasting key indicators from the output time-series, providing deeper insights beyond basic price or temperature predictions.
47
+
48
+
### News-driven Question Answering
49
+
50
+
The news-driven QA task includes two sub-tasks: correlation prediction and multi-choice QA. As shown in Figure 4, this task requires models to analyze both text and time-series data, understanding the news content while predicting its potential impact on future trends based on historical time-series.
51
+
52
+

37
53
38
54
Various state-of-the-art large language models (LLMs) were evaluated on MTBench to measure their ability to link news with time-series trends (see **Leaderboard**). The results reveal key challenges—models struggle with long-term pattern recognition, cause-and-effect relationships, and seamlessly combining insights from text and numbers.
0 commit comments