Skip to content

Commit 2ff6bfa

Browse files
committed
add new annotation validation metrics
1 parent 95567d3 commit 2ff6bfa

File tree

1 file changed

+38
-19
lines changed

1 file changed

+38
-19
lines changed

docs/evaluation.md

Lines changed: 38 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -89,44 +89,63 @@ Before comparison, text is normalized:
8989

9090
## Validation Results
9191

92-
### Current Performance
92+
### Current Performance (v0.1.5)
9393

94-
**Overall**: 10/10 (100%) passing
94+
**Overall**: 19/20 (95%) passing
9595

9696
| Poster ID | Word | ROUGE-L | Numbers | Fields | Source | Status |
9797
|-----------|------|---------|---------|--------|--------|--------|
98-
| 10890106 | 0.98 | 0.85 | 1.00 | 0.89 | pdfalto ||
99-
| 15963941 | 0.98 | 0.93 | 1.00 | 0.84 | pdfalto ||
100-
| 16083265 | 0.90 | 0.90 | 0.82 | 0.92 | pdfalto ||
101-
| 17268692 | 1.00 | 0.83 | 1.00 | 1.70 | pdfalto ||
102-
| 42 | 0.99 | 0.88 | 1.00 | 0.85 | pdfalto ||
103-
| 4737132 | 0.94 | 0.79 | 0.96 | 1.22 | qwen_vision ||
104-
| 5128504 | 0.99 | 1.00 | 1.00 | 1.04 | pdfalto ||
105-
| 6724771 | 0.89 | 0.95 | 0.85 | 0.96 | pdfalto ||
106-
| 8228476 | 0.94 | 0.87 | 0.89 | 0.91 | pdfalto ||
107-
| 8228568 | 0.99 | 0.91 | 0.82 | 0.79 | pdfalto ||
98+
| 10890106 | 0.94 | 0.75 | 1.00 | 0.80 | pdfalto ||
99+
| 15963941 | 0.95 | 0.91 | 0.97 | 0.76 | pdfalto ||
100+
| 16083265 | 0.90 | 0.87 | 0.96 | 0.71 | pdfalto ||
101+
| 17268692 | 0.97 | 0.99 | 0.91 | 0.83 | pdfalto ||
102+
| 42 | 0.97 | 0.87 | 0.97 | 0.77 | pdfalto ||
103+
| 4446908 | 0.95 | 0.91 | 0.90 | 0.98 | pdfalto ||
104+
| 4448680 | 0.79 | 0.81 | 0.69 | 0.97 | pdfalto ||
105+
| 4519718 | 0.98 | 0.99 | 0.89 | 0.78 | pdfalto ||
106+
| 4552067 | 0.94 | 0.92 | 1.00 | 0.75 | pdfalto ||
107+
| 4560930 | 0.96 | 0.91 | 0.96 | 0.92 | pdfalto ||
108+
| 4564017 | 0.94 | 0.97 | 0.85 | 0.83 | pdfalto ||
109+
| 4607450 | 0.95 | 0.93 | 0.93 | 0.89 | pdfalto ||
110+
| 4737132 | 0.91 | 0.81 | 0.93 | 0.83 | qwen_vision ||
111+
| 5128504 | 0.97 | 0.99 | 0.92 | 0.88 | pdfalto ||
112+
| 6724771 | 0.93 | 0.95 | 0.82 | 0.91 | pdfalto ||
113+
| 8228476 | 0.94 | 0.88 | 0.90 | 0.75 | pdfalto ||
114+
| 8228568 | 0.97 | 0.82 | 0.92 | 0.68 | pdfalto ||
115+
| AISec2025-poster | 0.92 | 0.80 | 0.89 | 1.98 | pdfalto ||
116+
| aysaekanger | 0.95 | 0.85 | 0.80 | 1.09 | pdfalto ||
117+
| isporeu2023ee359130949 | 0.96 | 0.79 | 0.98 | 1.45 | pdfalto ||
108118

109119
### Aggregate Metrics
110120

111121
| Metric | Average Score |
112122
|--------|---------------|
113-
| Word Capture | 0.96 |
123+
| Word Capture | 0.94 |
114124
| ROUGE-L | 0.89 |
115-
| Number Capture | 0.93 |
116-
| Field Proportion | 0.99 |
125+
| Number Capture | 0.91 |
126+
| Field Proportion | 0.93 |
127+
128+
### Failure Analysis
129+
130+
| Poster ID | Failing Metric | Score | Root Cause |
131+
|-----------|----------------|-------|------------|
132+
| 4448680 | Number Capture | 0.69 | Model misses numeric data from the Systems subsection of this multi-component SOFC poster |
117133

118134
## Test Set
119135

120-
The validation set includes 10 manually annotated scientific posters:
136+
The validation set includes 20 manually annotated scientific posters:
121137

122-
- **9 PDF posters**: Processed via pdfalto
138+
- **19 PDF posters**: Processed via pdfalto
123139
- **1 image poster**: Processed via Qwen2-VL
124140

125-
Posters cover diverse formats:
141+
Posters cover diverse domains and formats:
142+
- Biomedical informatics, astronomy, astrophysics, bioinformatics, genetics
143+
- Altmetrics, research data management, research infrastructure, cybersecurity
144+
- Oncology, health economics, fuel cell manufacturing
126145
- Single and multi-column layouts
127146
- Various font sizes and styles
128147
- Tables, figures, and charts
129-
- Multiple languages
148+
- Multiple languages (English, German)
130149

131150
## Running Validation
132151

0 commit comments

Comments
 (0)