This guide provides in-depth technical information about the architecture and extensibility of the GeoTIFF ToolKit. It is intended for developers who wish to contribute to the project or customize it for specific workflows.
The toolkit uses a Builder pattern to separate report content (what to include) from report formatting (how to present).
-
Data Models (
utils/data_models.py)- Strongly-typed dataclasses for all report data
- Examples:
FileComparison,IfdTableData,StatisticsData - Ensures type safety and clear contracts between components
-
Data Fetchers (
utils/data_fetchers.py)- Extract data from GeoTIFF files
- Return dataclass instances
- Examples:
fetch_tags_data(),fetch_statistics_data()
-
Report Builders (
utils/report_builders.py)- Determine WHAT sections to include in reports
- Classes:
MetadataReportBuilder,ComparisonReportBuilder - Usage:
builder.add_standard_sections(['tags', 'statistics'])
-
Section Renderers (
utils/section_renderers.py)- Render individual sections to markdown
- Base class:
MarkdownRenderer - Extensible for custom rendering logic
-
Report Formatters (
utils/report_formatters.py)- Format complete reports for output (HTML or Markdown)
- Classes:
HtmlReportFormatter,MarkdownReportFormatter - Handle document structure, CSS, navigation, and table of contents
from utils.report_context import build_context_from_file
from utils.report_builders import MetadataReportBuilder
from utils.report_formatters import HtmlReportFormatter
# Build context from a GeoTIFF file
context = build_context_from_file('input.tif')
# Build report structure
builder = MetadataReportBuilder(context)
builder.add_standard_sections(['tags', 'statistics', 'cog'])
# Format as HTML
formatter = HtmlReportFormatter(context)
formatter.sections = builder.sections
html_report = formatter.generate()
# Write to file
with open('report.html', 'w') as f:
f.write(html_report)from osgeo import gdal
from utils.report_builders import ComparisonReportBuilder
from utils.report_formatters import HtmlReportFormatter
from tools.compare_compression import build_differences_data
# Open datasets
base_ds = gdal.Open('baseline.tif')
comp_ds = gdal.Open('optimized.tif')
# Build differences data
differences = build_differences_data(
base_ds, comp_ds, args, 'Baseline', 'Optimized'
)
# Build report sections
builder = ComparisonReportBuilder(base_ds, comp_ds, 'Baseline', 'Optimized')
builder.add_differences_section(differences)
builder.add_ifd_sections()
builder.add_statistics_sections()
builder.add_histogram_sections()
builder.add_cog_sections()
# Generate HTML output
context = {'input_filename': 'optimized.tif'}
formatter = HtmlReportFormatter(context)
formatter.sections = builder.sections
html_report = formatter.generate()
# Write report
with open('comparison.html', 'w') as f:
f.write(html_report)To add a new section type:
-
Create a dataclass in
data_models.py:@dataclass class CustomSectionData: title: str data: Dict[str, Any]
-
Add a fetcher function in
data_fetchers.py:def fetch_custom_data(context: Dict[str, Any]) -> Optional[CustomSectionData]: # Extract and return data return CustomSectionData(title="Custom", data={...})
-
Add a renderer method in
section_renderers.py:def render_custom_section(self, data: CustomSectionData) -> str: # Generate markdown return f"### {data.title}\n..."
-
Use the builder to add your section:
builder.add_section('custom', 'Custom Section', 'Custom', custom_data)
- Separation of Concerns: Content selection, data fetching, rendering, and formatting are independent
- Extensibility: Easy to add new report types, output formats, or section types
- Testability: Each component can be tested in isolation
- Reusability: Builders and formatters can be mixed and matched
- Type Safety: Strong typing with dataclasses prevents runtime errors
When running within ArcGIS Pro, the toolkit uses an isolation strategy to ensure compatibility and stability.
- Challenge: ArcGIS Pro uses a specific, often older or modified, internal Python environment (
arcpy) Although its gdal module is up-to-date, many legacy configurations and creation options reside in Esri'sgdal_e.dll, which notably is NOT kept in sync with GDAL'sgdal.dllat each release. The outdated settings particularly affect the creation of IFDs, internal masks, metadata, and SRS handling as it lacks PROJ to maintain compliance with the EPSG Registry. - Solution: The
optimize-arctool acts as a bridge.- It runs within the ArcGIS Pro Python environment to handle the GUI and argument parsing.
- It then constructs a payload of GDAL commands.
- It executes a standalone
gdal_runner.pyscript in a separate, fully-featured OSGeo4W environment (configured inconfig.toml). - This ensures that the heavy lifting (compression, COG creation) is done by a modern, standard GDAL stack, while the user interface remains integrated with ArcGIS Pro.
- Dependencies: To use the isolated environment capability, OSGeo4W must be installed on the system. It is commonly installed alongside QGIS but can also be installed independently.
- Download Installer: OSGeo4W Network Installer
- Required Libraries:
- The
gdal_runner.pyscript relies on a standard OSGeo4W installation. - Ensure the
gdal,python3-gdal,numpy, andpython3-numpypackages are selected during installation (typically included in the "Express Desktop" install). - The path to the OSGeo4W root directory (e.g.,
C:\OSGeo4W) must be correctly set inconfig.toml.
- The
gttk optimize uses a sophisticated, multi-step pipeline to process your data. All steps are performed in-memory using GDAL's virtual file system, meaning no temporary files are written to disk.
- Initial Read & Analysis: Opens the input file and gathers key metadata (resolution, data type, spatial reference system)
- SRS Handling: Checks for and parses compound SRS; creates new compound SRS if
--vertical-srsis provided - Resampling/Reprojection (if needed): Uses
gdal.Warpto create a new in-memory dataset if resolution or SRS changes - Alpha-to-Mask Conversion (for images): Converts alpha channel to internal mask for better COG compatibility and compression
- Rounding (for floats): Performs block-based rounding for large floating-point rasters, allowing efficient processing of files too large for RAM
- Final Compression and COG Creation: Processed in-memory dataset is passed to the COG driver for compression and writing. Overviews are generated at this stage.
The toolkit includes a built-in lookup table that maps Esri-specific CRS names to their corresponding EPSG codes. This feature automatically standardizes GeoTIFFs that are missing an EPSG authority code in their CRS definition, which is common for files generated by Esri software.
The lookup table is stored as a JSON file at resources/esri/esri_epsg_name_lookup.json. This file is packaged with the toolkit and is used by default for all SRS standardization operations.
The lookup table is generated from Esri's projection-engine-db-doc GitHub repository. To update the local version to the latest data, run:
python tools/build_esri_epsg_lookup.pyThis will fetch the latest CRS definitions from the repository and overwrite the existing JSON file with the updated data.
This project includes code from the following external source:
- GDAL: validate_cloud_optimized_geotiff.py
Project: GDAL - Open Source Geospatial Foundation
Copyright (c) 2017, Even Rouault
Licensed under the MIT License
Original source: validate_cloud_optimized_geotiff.py