@@ -89,44 +89,63 @@ Before comparison, text is normalized:
8989
9090# # Validation Results
9191
92- # ## Current Performance
92+ # ## Current Performance (v0.1.5)
9393
94- ** Overall** : 10 / 10 ( 100 % ) passing
94+ ** Overall** : 19 / 20 ( 95 % ) passing
9595
9696| Poster ID | Word | ROUGE - L | Numbers | Fields | Source | Status |
9797| ---------- - | ------ | -------- - | -------- - | -------- | -------- | -------- |
98- | 10890106 | 0.98 | 0.85 | 1.00 | 0.89 | pdfalto | ✅ |
99- | 15963941 | 0.98 | 0.93 | 1.00 | 0.84 | pdfalto | ✅ |
100- | 16083265 | 0.90 | 0.90 | 0.82 | 0.92 | pdfalto | ✅ |
101- | 17268692 | 1.00 | 0.83 | 1.00 | 1.70 | pdfalto | ✅ |
102- | 42 | 0.99 | 0.88 | 1.00 | 0.85 | pdfalto | ✅ |
103- | 4737132 | 0.94 | 0.79 | 0.96 | 1.22 | qwen_vision | ✅ |
104- | 5128504 | 0.99 | 1.00 | 1.00 | 1.04 | pdfalto | ✅ |
105- | 6724771 | 0.89 | 0.95 | 0.85 | 0.96 | pdfalto | ✅ |
106- | 8228476 | 0.94 | 0.87 | 0.89 | 0.91 | pdfalto | ✅ |
107- | 8228568 | 0.99 | 0.91 | 0.82 | 0.79 | pdfalto | ✅ |
98+ | 10890106 | 0.94 | 0.75 | 1.00 | 0.80 | pdfalto | ✅ |
99+ | 15963941 | 0.95 | 0.91 | 0.97 | 0.76 | pdfalto | ✅ |
100+ | 16083265 | 0.90 | 0.87 | 0.96 | 0.71 | pdfalto | ✅ |
101+ | 17268692 | 0.97 | 0.99 | 0.91 | 0.83 | pdfalto | ✅ |
102+ | 42 | 0.97 | 0.87 | 0.97 | 0.77 | pdfalto | ✅ |
103+ | 4446908 | 0.95 | 0.91 | 0.90 | 0.98 | pdfalto | ✅ |
104+ | 4448680 | 0.79 | 0.81 | 0.69 | 0.97 | pdfalto | ❌ |
105+ | 4519718 | 0.98 | 0.99 | 0.89 | 0.78 | pdfalto | ✅ |
106+ | 4552067 | 0.94 | 0.92 | 1.00 | 0.75 | pdfalto | ✅ |
107+ | 4560930 | 0.96 | 0.91 | 0.96 | 0.92 | pdfalto | ✅ |
108+ | 4564017 | 0.94 | 0.97 | 0.85 | 0.83 | pdfalto | ✅ |
109+ | 4607450 | 0.95 | 0.93 | 0.93 | 0.89 | pdfalto | ✅ |
110+ | 4737132 | 0.91 | 0.81 | 0.93 | 0.83 | qwen_vision | ✅ |
111+ | 5128504 | 0.97 | 0.99 | 0.92 | 0.88 | pdfalto | ✅ |
112+ | 6724771 | 0.93 | 0.95 | 0.82 | 0.91 | pdfalto | ✅ |
113+ | 8228476 | 0.94 | 0.88 | 0.90 | 0.75 | pdfalto | ✅ |
114+ | 8228568 | 0.97 | 0.82 | 0.92 | 0.68 | pdfalto | ✅ |
115+ | AISec2025- poster | 0.92 | 0.80 | 0.89 | 1.98 | pdfalto | ✅ |
116+ | aysaekanger | 0.95 | 0.85 | 0.80 | 1.09 | pdfalto | ✅ |
117+ | isporeu2023ee359130949 | 0.96 | 0.79 | 0.98 | 1.45 | pdfalto | ✅ |
108118
109119# ## Aggregate Metrics
110120
111121| Metric | Average Score |
112122| -------- | -------------- - |
113- | Word Capture | 0.96 |
123+ | Word Capture | 0.94 |
114124| ROUGE - L | 0.89 |
115- | Number Capture | 0.93 |
116- | Field Proportion | 0.99 |
125+ | Number Capture | 0.91 |
126+ | Field Proportion | 0.93 |
127+
128+ # ## Failure Analysis
129+
130+ | Poster ID | Failing Metric | Score | Root Cause |
131+ | ---------- - | ---------------- | ------ - | ------------ |
132+ | 4448680 | Number Capture | 0.69 | Model misses numeric data from the Systems subsection of this multi- component SOFC poster |
117133
118134# # Test Set
119135
120- The validation set includes 10 manually annotated scientific posters:
136+ The validation set includes 20 manually annotated scientific posters:
121137
122- - ** 9 PDF posters** : Processed via pdfalto
138+ - ** 19 PDF posters** : Processed via pdfalto
123139- ** 1 image poster** : Processed via Qwen2- VL
124140
125- Posters cover diverse formats:
141+ Posters cover diverse domains and formats:
142+ - Biomedical informatics, astronomy, astrophysics, bioinformatics, genetics
143+ - Altmetrics, research data management, research infrastructure, cybersecurity
144+ - Oncology, health economics, fuel cell manufacturing
126145- Single and multi- column layouts
127146- Various font sizes and styles
128147- Tables, figures, and charts
129- - Multiple languages
148+ - Multiple languages (English, German)
130149
131150# # Running Validation
132151
0 commit comments