Skip to content

Commit 0c9d44d

Browse files
committed
MTBench: Added subsections for each task and their descriptions
1 parent 21fb408 commit 0c9d44d

File tree

3 files changed

+19
-3
lines changed

3 files changed

+19
-3
lines changed
850 KB
Loading

app/projects/mtbench/page.mdx

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,25 @@ As shown in Figure 2, MTBench enables a range of complex reasoning tasks beyond
3131

3232
![Figure 2. An overview of tasks in MTBench |scale=0.4](./assets/diagram.png)
3333

34-
The news-driven QA task includes two sub-tasks: correlation prediction and multi-choice QA. As shown in Figure 3, this task requires models to analyze both text and time-series data, understanding the news content while predicting its potential impact on future trends based on historical time-series.
34+
### Time-Series Forecasting
3535

36-
![Figure 3. An Example of Multi-choice QA and Correlation Prediction on Finance Dataset |scale=0.8](./assets/QA_sample.png)
36+
This task aims to forecast time-series values from historical data, optionally incorporating news articles. We assess short- and long-term forecasting: finance uses 30 days of historical data, while weather forecasting relies on 14 days to predict the next 3, reflecting shorter memory dynamics.
37+
38+
### Semantic Trend Analysis
39+
40+
For this task we analyze time-series trends by computing the percentage change between input and output data, categorizing results into discrete trend labels (see example in Figure 3). This helps evaluate directional movement and model accuracy.
41+
42+
![Figure 3. An Example of Stock Trend Prediction |scale=0.6](./assets/trend_prediction.png)
43+
44+
### Technical Indicator Prediction
45+
46+
This task evaluates the model’s ability to predict financial and weather metrics by forecasting key indicators from the output time-series, providing deeper insights beyond basic price or temperature predictions.
47+
48+
### News-driven Question Answering
49+
50+
The news-driven QA task includes two sub-tasks: correlation prediction and multi-choice QA. As shown in Figure 4, this task requires models to analyze both text and time-series data, understanding the news content while predicting its potential impact on future trends based on historical time-series.
51+
52+
![Figure 4. An Example of Multi-choice QA and Correlation Prediction on Finance Dataset |scale=0.8](./assets/QA_sample.png)
3753

3854
Various state-of-the-art large language models (LLMs) were evaluated on MTBench to measure their ability to link news with time-series trends (see **Leaderboard**). The results reveal key challenges—models struggle with long-term pattern recognition, cause-and-effect relationships, and seamlessly combining insights from text and numbers.
3955

components/sortable-table.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -351,4 +351,4 @@ const SortableTable4 = createSortableTable(headers4);
351351

352352
// export default SortableTable;
353353
// Export all tables
354-
export { SortableTable ,SortableTable1, SortableTable2, SortableTable3, SortableTable4 };
354+
export { SortableTable, SortableTable1, SortableTable2, SortableTable3, SortableTable4 };

0 commit comments

Comments
 (0)