CVAT Evaluation Pipeline Utility

A flexible pipeline for evaluating CVAT annotations that converts CVAT XML files to DoclingDocument format and runs layout and document structure evaluations.

Features

Convert CVAT XML annotations to DoclingDocument JSON format
Create ground truth datasets from CVAT annotations
Create prediction datasets for evaluation
Run layout and document structure evaluations
Support for step-by-step or end-to-end execution
Configurable evaluation modalities

Requirements

The utility requires the following inputs:

Images Directory: Directory containing PNG image files
Ground Truth XML: CVAT XML file with ground truth annotations
Prediction XML: CVAT XML file with prediction annotations (different from ground truth)
Output Directory: Directory where all pipeline outputs will be saved

Usage

Command Line Interface

python cvat_evaluation_pipeline.py <images_dir> <output_dir> [OPTIONS]

Required Arguments

images_dir: Directory containing PNG image files
output_dir: Output directory for pipeline results

Optional Arguments

--gt-xml PATH: Path to ground truth CVAT XML file
--pred-xml PATH: Path to prediction CVAT XML file
--step {gt,pred,eval,full}: Pipeline step to run (default: full)
--modalities {layout,document_structure}: Evaluation modalities to run (default: both)
--strict: Strict mode - require all images to have annotations in XML files (default: allow partial annotation batches)
--verbose, -v: Enable verbose logging

Examples

1. Run Full Pipeline

Convert both ground truth and prediction CVAT XMLs, create datasets, and run evaluations:

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --gt-xml /path/to/ground_truth.xml \
    --pred-xml /path/to/predictions.xml

2. Run Step by Step

Step 1: Create Ground Truth Dataset

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --gt-xml /path/to/ground_truth.xml \
    --step gt

Step 2: Create Prediction Dataset

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --pred-xml /path/to/predictions.xml \
    --step pred

Step 3: Run Evaluation

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --step eval

3. Run Specific Evaluation Modalities

Run only layout evaluation:

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --gt-xml /path/to/ground_truth.xml \
    --pred-xml /path/to/predictions.xml \
    --modalities layout

Run only document structure evaluation:

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --gt-xml /path/to/ground_truth.xml \
    --pred-xml /path/to/predictions.xml \
    --modalities document_structure

5. Strict Mode

By default, the pipeline allows partial annotation batches where not all images need to have annotations in the XML file. This is useful when you have a large set of images but only a subset has been annotated.

To enforce that ALL images must have annotations, use the --strict flag:

python cvat_evaluation_pipeline.py \
    /path/to/images \
    /path/to/output \
    --gt-xml /path/to/complete_annotations.xml \
    --strict

In strict mode:

The pipeline will fail with an error if any image lacks annotations
Useful for validating complete annotation batches
Helps catch missing annotations early in the process

Output Structure

The pipeline creates the following directory structure in the output directory:

output_dir/
├── ground_truth_json/          # Ground truth DoclingDocument JSON files
│   ├── gt_image1.json
│   └── gt_image2.json
├── predictions_json/           # Prediction DoclingDocument JSON files
│   ├── pred_image1.json
│   └── pred_image2.json
├── gt_dataset/                # Ground truth dataset
│   ├── test/
│   └── visualizations/
├── eval_dataset/              # Evaluation dataset
│   ├── test/
│   └── visualizations/
└── evaluation_results/        # Evaluation results
    ├── layout_evaluation/
    └── document_structure_evaluation/

Pipeline Steps Explained

Step 1: Ground Truth Dataset Creation

Converts ground truth CVAT XML to DoclingDocument JSON format
Creates a ground truth dataset using FileDatasetBuilder
Generates visualizations for quality inspection

Step 2: Prediction Dataset Creation

Converts prediction CVAT XML to DoclingDocument JSON format
Creates a prediction dataset using FilePredictionProvider
Links predictions to the ground truth dataset for evaluation

Step 3: Evaluation

Runs layout evaluation (mean Average Precision metrics)
Runs document structure evaluation (edit distance metrics)
Saves detailed evaluation results and visualizations

Error Handling

The utility includes comprehensive error handling:

Validates input paths and file existence
Provides clear error messages for missing requirements
Continues processing other files if individual conversions fail
Logs warnings for failed conversions without stopping the pipeline

Logging

The utility provides detailed logging with timestamps:

INFO level: Progress updates and results
WARNING level: Non-critical issues (e.g., failed conversions)
ERROR level: Critical errors that stop execution
Use --verbose flag for DEBUG level logging

Integration with Existing Codebase

This utility is designed to work with the existing docling-eval framework and uses:

docling_eval.cvat_tools.cvat_to_docling for CVAT conversion
docling_eval.dataset_builders.file_dataset_builder for dataset creation
docling_eval.prediction_providers.file_provider for prediction datasets
docling_eval.cli.main.evaluate for running evaluations

Tips for Best Results

Image Naming: Ensure PNG files have consistent naming that matches the CVAT annotations
XML Validation: Verify that both ground truth and prediction XML files are valid CVAT exports
Output Space: Ensure sufficient disk space for intermediate JSON files and datasets
Step-by-Step: For large datasets, consider running steps separately for better resource management
Visualization: Check the generated visualizations to verify conversion quality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CVAT Evaluation Pipeline Utility

Features

Requirements

Usage

Command Line Interface

Required Arguments

Optional Arguments

Examples

1. Run Full Pipeline

2. Run Step by Step

3. Run Specific Evaluation Modalities

5. Strict Mode

Output Structure

Pipeline Steps Explained

Step 1: Ground Truth Dataset Creation

Step 2: Prediction Dataset Creation

Step 3: Evaluation

Error Handling

Logging

Integration with Existing Codebase

Tips for Best Results

FilesExpand file tree

README_cvat_evaluation_pipeline.md

Latest commit

History

README_cvat_evaluation_pipeline.md

File metadata and controls

CVAT Evaluation Pipeline Utility

Features

Requirements

Usage

Command Line Interface

Required Arguments

Optional Arguments

Examples

1. Run Full Pipeline

2. Run Step by Step

3. Run Specific Evaluation Modalities

5. Strict Mode

Output Structure

Pipeline Steps Explained

Step 1: Ground Truth Dataset Creation

Step 2: Prediction Dataset Creation

Step 3: Evaluation

Error Handling

Logging

Integration with Existing Codebase

Tips for Best Results