OpenElections Utils

General data processing utilities for OpenElections projects.

Precinct Results Generator

A Python utility for consolidating county-level precinct CSV files into statewide precinct results files.

Overview

This module provides functionality to merge multiple county-level precinct result CSV files into a single statewide file. It automatically introspects vote type columns (such as early_voting, election_day, mail, absentee, provisional, etc.) and handles varying data structures across counties.

Based on: openelections-data-tx statewide_generator.py

Features

Automatic column introspection: Discovers vote type columns by excluding standard columns (county, precinct, office, district, party, candidate)
Flexible data handling: Works with any combination of vote type columns across different counties
Office filtering: Optionally include only specific offices
Custom file patterns: Support for different file naming conventions
Reporting: Generate reports showing which vote columns are present in each county
Reusable: Designed to be imported into any OpenElections state repository

Installation

Copy precinct_results.py to your OpenElections data repository:

# From your openelections-data-XX repository
curl -O https://raw.githubusercontent.com/openelections/utils/main/precinct_results.py

Quick Start

from precinct_results import generate_statewide_precinct_file

generate_statewide_precinct_file(
    state_abbr='tx',
    election_date='20201103',
    source_directory='2020/counties',
    output_file='20201103__tx__general__precinct.csv'
)

Usage Examples

With Office Filtering

offices = ['President', 'U.S. Senate', 'U.S. House', 'Governor']

generate_statewide_precinct_file(
    state_abbr='tx',
    election_date='20201103',
    source_directory='2020/counties',
    output_file='20201103__tx__general__precinct.csv',
    offices_filter=offices
)

Custom File Pattern

generate_statewide_precinct_file(
    state_abbr='ca',
    election_date='20201103',
    source_directory='2020/counties',
    output_file='20201103__ca__general__precinct.csv',
    file_pattern='*_precinct_results.csv'
)

Generate Vote Columns Report

from precinct_results import generate_vote_columns_report

generate_vote_columns_report(
    source_directory='2020/counties',
    file_pattern='20201103*precinct.csv',
    output_file='vote_columns_report.csv'
)

See example_usage.py for more comprehensive examples.

API Reference

`generate_statewide_precinct_file()`

Generate a consolidated statewide precinct results file from county CSV files.

Parameters:

state_abbr (str): Two-letter state abbreviation (e.g., 'tx', 'ca')
election_date (str): Election date in YYYYMMDD format (e.g., '20201103')
source_directory (str): Directory containing county precinct CSV files
output_file (str): Path for the output consolidated CSV file
file_pattern (str, optional): Glob pattern for matching files. Defaults to '{election_date}*precinct.csv'
offices_filter (list, optional): List of office names to include. If None, includes all offices
verbose (bool, optional): If True, print progress messages. Defaults to True

`generate_vote_columns_report()`

Generate a report showing which vote type columns are present in each county file.

Parameters:

source_directory (str): Directory containing county precinct CSV files
file_pattern (str): Glob pattern for matching files
output_file (str): Path for the output report CSV file
verbose (bool, optional): If True, print progress messages. Defaults to True

How It Works

Column Discovery: Scans all matching CSV files and identifies vote type columns by excluding standard columns
Column Unification: Collects all unique vote type columns across all counties
Data Consolidation: Reads each county file and consolidates into the output with consistent column ordering
Missing Data Handling: Fills None for vote type columns not present in a particular county

Input Format

County-level CSV files should have:

Required columns:

county, precinct, office, district, candidate, party

Vote type columns (any combination):

votes, early_voting, election_day, mail, absentee, provisional, etc.

Output Format

Standard columns first: county, precinct, office, district, candidate, party
All discovered vote type columns in sorted order
One row per precinct/office/candidate combination

Party Variation Checker

Utilities to detect inconsistent party naming conventions in CSV files.

Overview

These functions help identify variations in party values (e.g., "DEM", "Democratic", "Democrat") that likely refer to the same political party. This is useful for:

Data quality assurance
Identifying data entry inconsistencies
Standardizing party names before processing

Features

Single file analysis: Check party variations in a single CSV file
Directory-wide analysis: Check party variations across multiple files
Similarity detection: Uses string similarity to identify potential variations
Frequency reporting: Shows occurrence counts for each party value
Optional CSV reports: Export detailed analysis to CSV files

Quick Start

Check a Single CSV File

from precinct_results import check_party_variations

results = check_party_variations('20201103__tx__general__precinct.csv')
print(results['unique_parties'])
print(results['potential_variations'])

Check Multiple Files in a Directory

from precinct_results import check_party_variations_directory

results = check_party_variations_directory(
    source_directory='2020/counties',
    file_pattern='20201103*precinct.csv',
    output_file='party_variations_report.csv'
)

API Reference

`check_party_variations()`

Check for variations in party values within a single CSV file.

Parameters:

csv_path (str): Path to the CSV file to analyze
similarity_threshold (float, optional): Minimum similarity (0.0-1.0) for variation detection. Default: 0.7
output_file (str, optional): Path to write detailed variation report
verbose (bool, optional): If True, print summary and variations found. Default: True

Returns:

Dictionary with:

unique_parties: Set of all unique party values found
total_count: Total number of party occurrences
party_counts: Dict mapping party value to occurrence count
potential_variations: List of (party1, party2, similarity) tuples
empty_count: Number of rows with empty/missing party values

`check_party_variations_directory()`

Check for party value variations across multiple CSV files in a directory.

Parameters:

source_directory (str): Directory containing CSV files
file_pattern (str): Glob pattern for matching files (e.g., '20201103*precinct.csv')
similarity_threshold (float, optional): Minimum similarity (0.0-1.0) for variation detection. Default: 0.7
output_file (str, optional): Path to write detailed variation report
verbose (bool, optional): If True, print summary and variations found. Default: True

Returns:

Dictionary with:

all_parties: Set of all unique party values across all files
by_file: Dict mapping filename to set of party values in that file
potential_variations: List of (party1, party2, similarity) tuples
file_count: Number of files processed

Example Output

Party Value Analysis:
  Total rows with party values: 125000
  Empty/missing party values: 150
  Unique party values found: 8

Party value frequency:
    DEM: 55000
    REP: 52000
    Democratic: 8500
    Republican: 7200
    LIB: 1800
    GRN: 450
    IND: 50

Potential variations detected: 2
    'DEM' (n=55000) ↔ 'Democratic' (n=8500) - similarity: 0.82
    'REP' (n=52000) ↔ 'Republican' (n=7200) - similarity: 0.79

Precinct Name Comparison

See the compare_precinct_names() function for comparing precinct names between elections.

CSV File Comparison

A comprehensive utility for comparing two CSV election files to identify and report differences.

Overview

The compare_csv_files() function compares two CSV files from the same state, county, and election to detect:

Missing rows (rows in File A but not in File B)
Extra rows (rows in File B but not in File A)
Value mismatches in matching rows
Structural differences (missing/extra columns)
Vote total discrepancies

This is useful for:

Data validation and quality assurance
Verifying corrections were applied correctly
Comparing original source data vs. processed data
Migration testing and format validation

Features

Flexible row identification: Auto-detects key columns or accepts custom column specification
Multiple output formats: Terminal (CLI), HTML web report, or both
CSV export: Export detailed differences for further analysis
Numeric tolerance: Optionally ignore small numeric differences
Column filtering: Exclude specific columns from comparison
Vote total analysis: Automatic calculation and comparison of vote totals
Color-coded output: Easy-to-read terminal output with ANSI colors
Detailed reporting: Summary statistics plus detailed difference listings

Quick Start

Basic Comparison

from precinct_results import compare_csv_files

# Compare two files
results = compare_csv_files(
    'original.csv',
    'corrected.csv',
    verbose=True
)

# Check if files match
if results['summary']['percentage_match'] == 100.0:
    print("Files are identical!")

Generate Web Report

compare_csv_files(
    'file_a.csv',
    'file_b.csv',
    output_format='web',
    output_file='comparison_report.html'
)
# Open comparison_report.html in a browser

Export Differences to CSV

compare_csv_files(
    'file_a.csv',
    'file_b.csv',
    csv_export='differences.csv',
    verbose=True
)
# Opens in Excel or can be imported for analysis

API Reference

`compare_csv_files()`

Compare two CSV files and report differences.

Parameters:

file_a (str): Path to first CSV file
file_b (str): Path to second CSV file
key_columns (list, optional): Columns to use as row identifier. If None, auto-detects from standard election columns (county, precinct, office, district, candidate, party)
ignore_columns (list, optional): Columns to exclude from comparison
tolerance (float, optional): Numeric comparison tolerance (default: 0.0)
max_differences (int, optional): Stop after N differences (default: unlimited)
output_format (str, optional): Output format - 'cli', 'web', or 'both' (default: 'cli')
output_file (str, optional): Output file path (HTML for web, text for cli)
csv_export (str, optional): Export detailed differences to CSV file
verbose (bool, optional): Print detailed output (default: True)
color (bool, optional): Use colored terminal output (default: True)

Returns:

Dictionary with comparison results containing:

metadata: File information and timestamps
summary: High-level statistics (match percentage, difference counts)
column_differences: Missing/extra columns
row_differences: Missing/extra rows
value_differences: Value mismatches with details
vote_totals: Vote totals and differences by vote type

Usage Examples

Compare with Custom Key Columns

# If your CSV has different structure, specify key columns
results = compare_csv_files(
    'file_a.csv',
    'file_b.csv',
    key_columns=['county', 'precinct', 'office', 'candidate']
)

Ignore Specific Columns

# Exclude columns you know will differ
compare_csv_files(
    'file_a.csv',
    'file_b.csv',
    ignore_columns=['provisional', 'absentee']
)

Use Numeric Tolerance

# Ignore differences of 5 votes or less
compare_csv_files(
    'file_a.csv',
    'file_b.csv',
    tolerance=5.0
)

Quiet Mode for Automation

# For use in scripts/CI pipelines
results = compare_csv_files(
    'file_a.csv',
    'file_b.csv',
    verbose=False,
    color=False
)

if results['summary']['total_differences'] > 0:
    print(f"FAIL: {results['summary']['total_differences']} differences")
    exit(1)
else:
    print("PASS: Files match")
    exit(0)

Analyze Specific Difference Types

results = compare_csv_files('file_a.csv', 'file_b.csv', verbose=False)

# Analyze vote total discrepancies
for vote_type, diff in results['vote_totals']['differences'].items():
    if diff != 0:
        print(f"{vote_type}: {diff:+,} vote difference")

# Count mismatches by column
mismatches_by_column = {}
for diff in results['value_differences']:
    col = diff['column']
    mismatches_by_column[col] = mismatches_by_column.get(col, 0) + 1

for col, count in sorted(mismatches_by_column.items()):
    print(f"{col}: {count} mismatch(es)")

Data Quality Validation Workflow

# Complete validation workflow
results = compare_csv_files(
    'original.csv',
    'processed.csv',
    csv_export='validation_differences.csv',
    verbose=False
)

acceptable_error_rate = 1.0  # 1%
error_rate = 100.0 - results['summary']['percentage_match']

if error_rate <= acceptable_error_rate:
    print("✓ PASS: Error rate within acceptable threshold")
else:
    print("✗ FAIL: Error rate exceeds threshold")
    # Generate detailed HTML report for review
    compare_csv_files(
        'original.csv',
        'processed.csv',
        output_format='web',
        output_file='validation_report.html',
        verbose=False
    )

Output Examples

Terminal Output (CLI)

CSV Comparison Report
======================================================================

Files:
  File A: original.csv
  File B: corrected.csv

Structure:
  Rows:    1,234 vs 1,235 (+1)
  Columns: 12 vs 12 (identical)

Results:
  ✓ Identical rows:      1,180 (95.6%)
  ✗ Missing rows:        10   (0.8%)
  ✗ Extra rows:          11   (0.9%)
  ✗ Value mismatches:    34   (2.7%)

Vote Totals:
  early_voting         File A: 125,432  File B: 125,450  (Δ +18)
  election_day         File A: 234,567  File B: 234,567  (✓ Match)
  mail                 File A: 45,678   File B: 45,680   (Δ +2)

Overall Match: 95.6%

Web Report

The HTML report includes:

Summary dashboard with match rate and difference counts
Interactive tables showing all differences
Vote total comparison charts
Sortable and filterable difference listings
Responsive design for desktop/mobile

CSV Export

Exported CSV includes columns:

type: Type of difference (missing_row, extra_row, value_mismatch)
county, precinct, office, district, candidate, party: Row identification
column: Column with difference (for value mismatches)
value_a, value_b: Values from each file
difference: Numeric difference (if applicable)
notes: Additional information

Integration with Existing Workflows

# Example: Compare before and after correction
from precinct_results import compare_csv_files

# Step 1: Make corrections
# ... your correction code ...

# Step 2: Validate corrections
results = compare_csv_files(
    'before_correction.csv',
    'after_correction.csv',
    csv_export='corrections_applied.csv',
    output_format='web',
    output_file='correction_report.html'
)

# Step 3: Verify expected changes
print(f"Corrections applied: {results['summary']['value_mismatches']}")
print(f"Match rate: {results['summary']['percentage_match']:.1f}%")

See example_usage.py for more comprehensive examples.

County vs Precinct Totals Comparison

A specialized utility for comparing county-level summary files to aggregated precinct-level totals.

Overview

The compare_county_to_precinct_totals() function verifies that county-level election summary files match the aggregated totals from their corresponding precinct-level files. This is essential for data quality assurance when you have both county-level summaries and detailed precinct breakdowns.

Use case: When given a county name (e.g., "adams") and election prefix (e.g., "20251104__pa__general"), this tool:

Reads the county summary file: 20251104__pa__general__adams__county.csv
Reads the precinct file: 20251104__pa__general__adams__precinct.csv
Aggregates all precinct-level data by office/candidate/party
Compares the aggregated precinct totals to the county summary
Reports any differences

Features

Automatic aggregation: Sums precinct-level data across all precincts
Flexible matching: Uses office, district, candidate, and party to match rows
Vote column detection: Automatically identifies all vote type columns
Multiple output formats: Terminal (CLI) or HTML web report
Numeric tolerance: Optionally ignore small rounding differences
Detailed reporting: Shows exactly which candidates have mismatches
Batch processing: Easy to check multiple counties

Quick Start

Basic Comparison

from precinct_results import compare_county_to_precinct_totals

results = compare_county_to_precinct_totals(
    election_prefix='20251104__pa__general',
    county_name='adams',
    directory='data/2025'
)

if results['summary']['total_differences'] == 0:
    print("County and precinct totals match!")

Command-Line Usage

# Basic comparison
python compare_county_precinct.py 20251104__pa__general adams

# With custom directory
python compare_county_precinct.py 20251104__pa__general adams --directory data/2025

# Generate HTML report
python compare_county_precinct.py 20251104__pa__general adams --web --output report.html

# Check multiple counties
for county in adams allegheny beaver; do
    python compare_county_precinct.py 20251104__pa__general $county
done

API Reference

`compare_county_to_precinct_totals()`

Compare county-level summary totals to aggregated precinct-level totals.

Parameters:

election_prefix (str): Election prefix (e.g., "20251104__pa__general")
county_name (str): County name (e.g., "adams")
directory (str, optional): Directory containing CSV files (default: '.')
tolerance (float, optional): Numeric comparison tolerance (default: 0.0)
verbose (bool, optional): Print detailed output (default: True)
output_format (str, optional): Output format - 'cli', 'web', or 'both' (default: 'cli')
output_file (str, optional): Optional output file path

Returns:

Dictionary with comparison results containing:

metadata: File information and comparison details
summary: High-level statistics (total differences, match percentage)
differences: List of rows with value mismatches
missing_in_precinct: Rows in county file but not in precinct data
missing_in_county: Rows in precinct data but not in county file

Raises:

FileNotFoundError: If either file doesn't exist
ValueError: If files are invalid or incompatible

Usage Examples

Generate Web Report

compare_county_to_precinct_totals(
    election_prefix='20251104__pa__general',
    county_name='adams',
    directory='data/2025',
    output_format='web',
    output_file='adams_county_check.html'
)

With Tolerance for Rounding

# Ignore differences of 5 votes or less
results = compare_county_to_precinct_totals(
    election_prefix='20251104__pa__general',
    county_name='adams',
    directory='data/2025',
    tolerance=5.0
)

Batch Check Multiple Counties

counties = ['adams', 'allegheny', 'beaver', 'berks', 'blair']
election_prefix = '20251104__pa__general'

issues_found = []
for county in counties:
    results = compare_county_to_precinct_totals(
        election_prefix=election_prefix,
        county_name=county,
        directory='data/2025',
        verbose=False
    )

    if results['summary']['total_differences'] > 0:
        issues_found.append({
            'county': county,
            'differences': results['summary']['total_differences']
        })

if issues_found:
    print(f"Counties with issues: {len(issues_found)}")
    for item in issues_found:
        print(f"  {item['county']}: {item['differences']} differences")
else:
    print("All counties match perfectly!")

Command-Line Interface

The compare_county_precinct.py script provides easy command-line access:

# Basic usage
python compare_county_precinct.py ELECTION_PREFIX COUNTY_NAME

# Options
  -d, --directory DIR    Directory containing CSV files (default: current)
  -t, --tolerance NUM    Numeric comparison tolerance (default: 0.0)
  --web                  Generate HTML web report
  -o, --output FILE      Output file path
  --quiet                Suppress verbose output

# Exit codes
  0: Files match
  1: Differences found
  2: File not found
  3: Other error

Output Examples

Terminal Output

County vs Precinct Totals Comparison
======================================================================
Election: 20251104__pa__general
County: adams

County file rows: 45
Precinct file rows: 523
Aggregated unique rows: 45

✓ MATCH: County and precinct totals match perfectly!

When differences exist:

✗ MISMATCH: Found 3 issue(s)

  Value mismatches: 3

Value Mismatches:
----------------------------------------------------------------------

1. office: President | candidate: Jane Smith | party: DEM
   Vote type: election_day
   County total:      12,345
   Precinct total:    12,350
   Difference:            +5

2. office: U.S. Senate | candidate: John Doe | party: REP
   Vote type: mail
   County total:       8,234
   Precinct total:     8,230
   Difference:            -4

Web Report

The HTML report includes:

Summary cards showing match status and total issues
Detailed table of all value mismatches
Vote type breakdown showing county vs precinct totals
Difference highlighting with color coding

How It Works

Load Files: Reads both county and precinct CSV files
Identify Vote Columns: Automatically detects vote type columns (excluding standard columns like county, precinct, office, etc.)
Aggregate Precinct Data: Groups precinct data by office, district, candidate, and party, summing all vote totals
Compare: Matches aggregated precinct totals to county summary totals
Report: Shows any discrepancies found

Expected File Structure

County file: {election_prefix}__{county}__county.csv

One row per office/candidate/party
Contains aggregated vote totals for the entire county

Precinct file: {election_prefix}__{county}__precinct.csv

Multiple rows per office/candidate/party (one per precinct)
Contains vote totals broken down by precinct

Both files should have:

Standard columns: county, precinct (precinct file only), office, district, candidate, party
Vote columns: Any combination like votes, early_voting, election_day, mail, absentee, provisional

Integration Example

# Validate all counties after data processing
import os
from precinct_results import compare_county_to_precinct_totals

election_prefix = '20251104__pa__general'
directory = 'data/2025'

# Find all county files
county_files = [f for f in os.listdir(directory)
                if f.startswith(election_prefix) and f.endswith('__county.csv')]

counties = [f.replace(f'{election_prefix}__', '').replace('__county.csv', '')
            for f in county_files]

# Validate each county
for county in counties:
    print(f"Validating {county}...")
    results = compare_county_to_precinct_totals(
        election_prefix=election_prefix,
        county_name=county,
        directory=directory,
        verbose=False
    )

    if results['summary']['total_differences'] == 0:
        print(f"  ✓ {county}: Match")
    else:
        print(f"  ✗ {county}: {results['summary']['total_differences']} differences")
        # Generate detailed report
        compare_county_to_precinct_totals(
            election_prefix=election_prefix,
            county_name=county,
            directory=directory,
            output_format='web',
            output_file=f'{county}_discrepancies.html',
            verbose=False
        )

See example_usage.py for more comprehensive examples.

License

See LICENSE file for details.

Contributing

Submit issues and pull requests to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CSV_COMPARISON_PLAN.md		CSV_COMPARISON_PLAN.md
LICENSE		LICENSE
README.md		README.md
compare_county_precinct.py		compare_county_precinct.py
example_aggregate_comparison.py		example_aggregate_comparison.py
example_usage.py		example_usage.py
precinct_results.py		precinct_results.py
test_html_report.py		test_html_report.py

License

openelections/utils

Folders and files

Latest commit

History

Repository files navigation

OpenElections Utils

Precinct Results Generator

Overview

Features

Installation

Quick Start

Usage Examples

With Office Filtering

Custom File Pattern

Generate Vote Columns Report

API Reference

generate_statewide_precinct_file()

generate_vote_columns_report()

How It Works

Input Format

Output Format

Party Variation Checker

Overview

Features

Quick Start

Check a Single CSV File

Check Multiple Files in a Directory

API Reference

check_party_variations()

check_party_variations_directory()

Example Output

Precinct Name Comparison

CSV File Comparison

Overview

Features

Quick Start

Basic Comparison

Generate Web Report

Export Differences to CSV

API Reference

compare_csv_files()

Usage Examples

Compare with Custom Key Columns

Ignore Specific Columns

Use Numeric Tolerance

Quiet Mode for Automation

Analyze Specific Difference Types

Data Quality Validation Workflow

Output Examples

Terminal Output (CLI)

Web Report

CSV Export

Integration with Existing Workflows

County vs Precinct Totals Comparison

Overview

Features

Quick Start

Basic Comparison

Command-Line Usage

API Reference

compare_county_to_precinct_totals()

Usage Examples

Generate Web Report

With Tolerance for Rounding

Batch Check Multiple Counties

Command-Line Interface

Output Examples

Terminal Output

Web Report

How It Works

Expected File Structure

Integration Example

License

Contributing

About

Resources

License

Uh oh!

Stars

`generate_statewide_precinct_file()`

`generate_vote_columns_report()`

`check_party_variations()`

`check_party_variations_directory()`

`compare_csv_files()`

`compare_county_to_precinct_totals()`

Packages