Skip to content

Commit 0a27664

Browse files
committed
Updating the script to use the sticklerservice.
1 parent 45edea3 commit 0a27664

File tree

3 files changed

+509
-77
lines changed

3 files changed

+509
-77
lines changed

config_library/pattern-2/fcc-invoices/README.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,14 @@ This example demonstrates:
1414

1515
```
1616
config_library/pattern-2/fcc-invoices/
17-
├── README.md # This file
18-
├── config.yaml # Base IDP configuration
19-
├── fcc_configured.yaml # Deployed stack configuration
20-
├── stickler_config.json # Stickler evaluation rules
21-
├── bulk_evaluate_fcc_invoices.py # Evaluation script
22-
└── sr_refactor_labels_5_5_25.csv # Ground truth labels (full dataset)
17+
├── README.md # This file
18+
├── config.yaml # Base IDP configuration
19+
├── fcc_configured.yaml # Deployed stack configuration
20+
├── stickler_config.json # Stickler evaluation rules
21+
├── bulk_evaluate_fcc_invoices.py # Legacy evaluation script (complex)
22+
├── bulk_evaluate_fcc_invoices_simple.py # Simplified evaluation script (recommended)
23+
├── sample_labels_3.csv # Ground truth for 3 sample documents
24+
└── sr_refactor_labels_5_5_25.csv # Ground truth labels (full dataset)
2325
```
2426

2527
## Sample Data
@@ -139,27 +141,44 @@ idp-cli download-results \
139141

140142
## Step 4: Run Evaluation
141143

142-
Evaluate the extraction results against ground truth:
144+
Evaluate the extraction results against ground truth using the **simplified evaluation script** (recommended):
143145

144146
```bash
145147
cd config_library/pattern-2/fcc-invoices
146148

149+
python bulk_evaluate_fcc_invoices_simple.py \
150+
--results-dir ../../../fcc_results/cli-batch-20251017-190516 \
151+
--csv-path sample_labels_3.csv \
152+
--config-path stickler_config.json \
153+
--output-dir evaluation_output
154+
```
155+
156+
**Alternative**: Use the legacy script (more complex, same results):
157+
```bash
147158
python bulk_evaluate_fcc_invoices.py \
148159
--results-dir ../../../fcc_results/cli-batch-20251017-190516 \
149160
--csv-path sample_labels_3.csv \
161+
--config-path stickler_config.json \
150162
--output-dir evaluation_output
151163
```
152164

153-
**Note**: The `sample_labels_3.csv` contains ground truth for only 1 of the 3 sample documents. For full dataset evaluation, use `sr_refactor_labels_5_5_25.csv`.
165+
**Note**: The `sample_labels_3.csv` contains ground truth for 3 sample documents. For full dataset evaluation, use `sr_refactor_labels_5_5_25.csv`.
154166

155167
**What this does:**
156168
- Loads ground truth labels from CSV
157169
- Matches documents by doc_id
158-
- Performs doc-by-doc comparison using Stickler
170+
- Performs doc-by-doc comparison using SticklerEvaluationService
159171
- Saves individual comparison results
160172
- Aggregates metrics across all documents
161173
- Generates comprehensive evaluation report
162174

175+
**Why use the simplified script?**
176+
- 260 lines vs 671 lines (61% less code)
177+
- Easier to understand and modify
178+
- No temporary file overhead
179+
- Direct integration with SticklerEvaluationService
180+
- Same accurate results
181+
163182
**Expected output:**
164183
```
165184
================================================================================

0 commit comments

Comments
 (0)