You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Even top models struggle on advanced templates and multi-hop symbolic chains
98
-
- FinChain reveals reasoning gaps not captured by standard accuracy metrics
100
+
- Frontier models lead ChainEval yet still struggle on advanced, compositional templates
101
+
- Finance-tuned and math-enhanced 7B models (FinR1, Mathstral) approach frontier performance under ChainEval
102
+
- Domain-wise analysis shows frontier systems remain balanced, while fine-tuned models excel in their target areas (e.g., FinR1 in reporting/risk, Mathstral in quantitative domains)
103
+
- Accuracy drops across all model families from basic to advanced templates, highlighting persistent gaps in symbolic financial reasoning
99
104
100
105
## 🚀 Quick Start
101
106
@@ -109,11 +114,25 @@ Explore templates:
109
114
ls data/templates/
110
115
```
111
116
112
-
Evaluate predictions (scripts coming soon):
117
+
Generate sample problems (each template script exposes a `main()` helper):
Aggregate metrics across domains, subtopics, and difficulty levels:
128
+
```bash
129
+
python chaineval/aggregate.py
130
+
```
131
+
132
+
## 📘 Documentation
133
+
134
+
- Detailed methodology, data pipeline, and evaluation discussion are available in the accompanying paper (`paper.pdf`).
135
+
117
136
## 💬 Feedback & Contributions
118
137
119
138
**FinChain is an ongoing project**, and we’re continuously working to expand its coverage, refine evaluation metrics, and improve data quality. We **welcome feedback, suggestions, and community contributions**—whether it's about financial domains we missed, new evaluation ideas, or improving symbolic template diversity. If you're interested in collaborating or contributing, feel free to open an issue or contact us directly.
@@ -126,7 +145,7 @@ If you find **FinChain** useful in your research, please consider citing our pap
126
145
127
146
@article{xie2025finchain,
128
147
title={FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning},
129
-
author={Xie, Zhuohan and Sahnan, Dhruv and Banerjee, Debopriyo and Georgiev, Georgi and Thareja, Rushil and Madmoun, Hachem and Su, Jinyan and Singh, Aaryamonvikram and Wang, Yuxia and Xing, Rui and Koto, Fajri and Li, Haonan and Koychev, Ivan and Chakraborty, Tanmoy and Lahlou, Salem and Stoyanov, Veselin and Nakov, Preslav},
148
+
author={Xie, Zhuohan and Orel, Daniil and Thareja, Rushil and Sahnan, Dhruv and Madmoun, Hachem and Zhang, Fan and Banerjee, Debopriyo and Georgiev, Georgi and Peng, Xueqing and Qian, Lingfei and Huang, Jimin and Su, Jinyan and Singh, Aaryamonvikram and Xing, Rui and Elbadry, Rania and Xu, Chen and Li, Haonan and Koto, Fajri and Koychev, Ivan and Chakraborty, Tanmoy and Wang, Yuxia and Lahlou, Salem and Stoyanov, Veselin and Ananiadou, Sophia and Nakov, Preslav},
130
149
journal={arXiv preprint arXiv:2506.02515},
131
150
year={2025}
132
151
}
@@ -139,12 +158,13 @@ If you find **FinChain** useful in your research, please consider citing our pap
139
158
140
159
FinChain is developed by:
141
160
142
-
Zhuohan Xie, Dhruv Sahnan, Debopriyo Banerjee, Georgi Georgiev,
143
-
Rushil Thareja, Hachem Madmoun, Jinyan Su, Aaryamonvikram Singh,
144
-
Yuxia Wang, Rui Xing, Fajri Koto, Haonan Li, Ivan Koychev,
145
-
Tanmoy Chakraborty, Salem Lahlou, Veselin Stoyanov, Preslav Nakov
Fan Zhang, Debopriyo Banerjee, Georgi Georgiev, Xueqing Peng, Lingfei Qian,
163
+
Jimin Huang, Jinyan Su, Aaryamonvikram Singh, Rui Xing, Rania Elbadry,
164
+
Chen Xu, Haonan Li, Fajri Koto, Ivan Koychev, Tanmoy Chakraborty,
165
+
Yuxia Wang, Salem Lahlou, Veselin Stoyanov, Sophia Ananiadou, Preslav Nakov
146
166
147
-
Affiliations: MBZUAI, Sofia University, Quantsquare, Cornell University, IIT Delhi
167
+
Affiliations: MBZUAI, Syllogia, The University of Tokyo, Sofia University "St. Kliment Ohridski", The Fin AI, Cornell University, The University of Melbourne, IIT Delhi, INSAIT, The University of Manchester
148
168
149
169
For questions or collaborations, contact: **zhuohan.xie@mbzuai.ac.ae**
0 commit comments