A flexible pipeline for evaluating CVAT annotations that converts CVAT XML files to DoclingDocument format and runs layout and document structure evaluations.
- Convert CVAT XML annotations to DoclingDocument JSON format
- Create ground truth datasets from CVAT annotations
- Create prediction datasets for evaluation
- Run layout and document structure evaluations
- Support for step-by-step or end-to-end execution
- Configurable evaluation modalities
The utility requires the following inputs:
- Images Directory: Directory containing PNG image files
- Ground Truth XML: CVAT XML file with ground truth annotations
- Prediction XML: CVAT XML file with prediction annotations (different from ground truth)
- Output Directory: Directory where all pipeline outputs will be saved
python cvat_evaluation_pipeline.py <images_dir> <output_dir> [OPTIONS]images_dir: Directory containing PNG image filesoutput_dir: Output directory for pipeline results
--gt-xml PATH: Path to ground truth CVAT XML file--pred-xml PATH: Path to prediction CVAT XML file--step {gt,pred,eval,full}: Pipeline step to run (default: full)--modalities {layout,document_structure}: Evaluation modalities to run (default: both)--strict: Strict mode - require all images to have annotations in XML files (default: allow partial annotation batches)--verbose, -v: Enable verbose logging
Convert both ground truth and prediction CVAT XMLs, create datasets, and run evaluations:
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--gt-xml /path/to/ground_truth.xml \
--pred-xml /path/to/predictions.xmlStep 1: Create Ground Truth Dataset
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--gt-xml /path/to/ground_truth.xml \
--step gtStep 2: Create Prediction Dataset
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--pred-xml /path/to/predictions.xml \
--step predStep 3: Run Evaluation
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--step evalRun only layout evaluation:
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--gt-xml /path/to/ground_truth.xml \
--pred-xml /path/to/predictions.xml \
--modalities layoutRun only document structure evaluation:
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--gt-xml /path/to/ground_truth.xml \
--pred-xml /path/to/predictions.xml \
--modalities document_structureBy default, the pipeline allows partial annotation batches where not all images need to have annotations in the XML file. This is useful when you have a large set of images but only a subset has been annotated.
To enforce that ALL images must have annotations, use the --strict flag:
python cvat_evaluation_pipeline.py \
/path/to/images \
/path/to/output \
--gt-xml /path/to/complete_annotations.xml \
--strictIn strict mode:
- The pipeline will fail with an error if any image lacks annotations
- Useful for validating complete annotation batches
- Helps catch missing annotations early in the process
The pipeline creates the following directory structure in the output directory:
output_dir/
├── ground_truth_json/ # Ground truth DoclingDocument JSON files
│ ├── gt_image1.json
│ └── gt_image2.json
├── predictions_json/ # Prediction DoclingDocument JSON files
│ ├── pred_image1.json
│ └── pred_image2.json
├── gt_dataset/ # Ground truth dataset
│ ├── test/
│ └── visualizations/
├── eval_dataset/ # Evaluation dataset
│ ├── test/
│ └── visualizations/
└── evaluation_results/ # Evaluation results
├── layout_evaluation/
└── document_structure_evaluation/
- Converts ground truth CVAT XML to DoclingDocument JSON format
- Creates a ground truth dataset using FileDatasetBuilder
- Generates visualizations for quality inspection
- Converts prediction CVAT XML to DoclingDocument JSON format
- Creates a prediction dataset using FilePredictionProvider
- Links predictions to the ground truth dataset for evaluation
- Runs layout evaluation (mean Average Precision metrics)
- Runs document structure evaluation (edit distance metrics)
- Saves detailed evaluation results and visualizations
The utility includes comprehensive error handling:
- Validates input paths and file existence
- Provides clear error messages for missing requirements
- Continues processing other files if individual conversions fail
- Logs warnings for failed conversions without stopping the pipeline
The utility provides detailed logging with timestamps:
- INFO level: Progress updates and results
- WARNING level: Non-critical issues (e.g., failed conversions)
- ERROR level: Critical errors that stop execution
- Use
--verboseflag for DEBUG level logging
This utility is designed to work with the existing docling-eval framework and uses:
docling_eval.cvat_tools.cvat_to_doclingfor CVAT conversiondocling_eval.dataset_builders.file_dataset_builderfor dataset creationdocling_eval.prediction_providers.file_providerfor prediction datasetsdocling_eval.cli.main.evaluatefor running evaluations
- Image Naming: Ensure PNG files have consistent naming that matches the CVAT annotations
- XML Validation: Verify that both ground truth and prediction XML files are valid CVAT exports
- Output Space: Ensure sufficient disk space for intermediate JSON files and datasets
- Step-by-Step: For large datasets, consider running steps separately for better resource management
- Visualization: Check the generated visualizations to verify conversion quality