Skip to content

Commit a76d51f

Browse files
committed
chore: update readme instructions
1 parent 5224d16 commit a76d51f

File tree

2 files changed

+2280
-1291
lines changed

2 files changed

+2280
-1291
lines changed

README.md

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ Convert scientific posters (PDF/images) to structured JSON metadata using Large
5858
**poster2json** extracts structured metadata from scientific conference posters (PDF or image format) into machine-actionable JSON conforming to the [poster-json-schema](https://github.com/fairdataihub/poster-json-schema).
5959

6060
The pipeline uses:
61+
6162
- **Llama 3.1 8B** (fine-tuned) for JSON structuring
6263
- **Qwen2-VL-7B** for vision-based OCR of image posters
6364
- **pdfalto** for layout-aware PDF text extraction
@@ -104,49 +105,56 @@ Output conforms to the [poster-json-schema](https://github.com/fairdataihub/post
104105
{
105106
"$schema": "https://posters.science/schema/v0.1/poster_schema.json",
106107
"creators": [
107-
{"name": "Garcia, Sofia", "givenName": "Sofia", "familyName": "Garcia", "affiliation": ["University"]}
108+
{
109+
"name": "Garcia, Sofia",
110+
"givenName": "Sofia",
111+
"familyName": "Garcia",
112+
"affiliation": ["University"]
113+
}
114+
],
115+
"titles": [
116+
{ "title": "Machine Learning Approaches to Diabetic Retinopathy Detection" }
108117
],
109-
"titles": [{"title": "Machine Learning Approaches to Diabetic Retinopathy Detection"}],
110118
"posterContent": {
111119
"sections": [
112-
{"sectionTitle": "Abstract", "sectionContent": "..."},
113-
{"sectionTitle": "Methods", "sectionContent": "..."},
114-
{"sectionTitle": "Results", "sectionContent": "..."}
120+
{ "sectionTitle": "Abstract", "sectionContent": "..." },
121+
{ "sectionTitle": "Methods", "sectionContent": "..." },
122+
{ "sectionTitle": "Results", "sectionContent": "..." }
115123
]
116124
},
117-
"imageCaptions": [{"captions": ["Figure 1.", "ROC curves showing..."]}],
118-
"tableCaptions": [{"captions": ["Table 1.", "Performance metrics"]}]
125+
"imageCaptions": [{ "captions": ["Figure 1.", "ROC curves showing..."] }],
126+
"tableCaptions": [{ "captions": ["Table 1.", "Performance metrics"] }]
119127
}
120128
```
121129

122130
## System Requirements
123131

124-
| Requirement | Specification |
125-
|-------------|---------------|
126-
| GPU | NVIDIA CUDA-capable, ≥16GB VRAM |
127-
| RAM | ≥32GB recommended |
128-
| Python | 3.10+ |
129-
| OS | Linux, macOS, Windows (via WSL2) |
132+
| Requirement | Specification |
133+
| ----------- | -------------------------------- |
134+
| GPU | NVIDIA CUDA-capable, ≥16GB VRAM |
135+
| RAM | ≥32GB recommended |
136+
| Python | 3.10+ |
137+
| OS | Linux, macOS, Windows (via WSL2) |
130138

131139
## Performance
132140

133141
Validated on 10 manually annotated scientific posters:
134142

135-
| Metric | Score | Threshold |
136-
|--------|-------|-----------|
137-
| Word Capture | 0.96 | ≥0.75 |
138-
| ROUGE-L | 0.89 | ≥0.75 |
139-
| Number Capture | 0.93 | ≥0.75 |
140-
| Field Proportion | 0.99 | 0.30–2.50 |
143+
| Metric | Score | Threshold |
144+
| ---------------- | ----- | --------- |
145+
| Word Capture | 0.96 | ≥0.75 |
146+
| ROUGE-L | 0.89 | ≥0.75 |
147+
| Number Capture | 0.93 | ≥0.75 |
148+
| Field Proportion | 0.99 | 0.30–2.50 |
141149

142150
**Pass Rate**: 10/10 (100%)
143151

144152
## Documentation
145153

146-
| Document | Description |
147-
|----------|-------------|
154+
| Document | Description |
155+
| ------------------------------------ | ------------------------------- |
148156
| [Architecture](docs/architecture.md) | Technical details & methodology |
149-
| [Evaluation](docs/evaluation.md) | Validation metrics & results |
157+
| [Evaluation](docs/evaluation.md) | Validation metrics & results |
150158

151159
## Development Setup
152160

@@ -155,8 +163,16 @@ Validated on 10 manually annotated scientific posters:
155163
git clone https://github.com/fairdataihub/poster2json.git
156164
cd poster2json
157165

158-
# Install with Poetry
166+
# Create a virtual environment
167+
python -m venv .venv
168+
169+
# Activate the virtual environment
170+
source venv/bin/activate # On Windows: .venv\Scripts\activate
171+
172+
# Install poetry
159173
pip install poetry
174+
175+
# Install dependencies
160176
poetry install
161177

162178
# Run tests

0 commit comments

Comments
 (0)