Usage instructions for the GEO Benchmark framework.
- Generate mesh (global coordinate grid)
- Run LLM benchmark (query LLMs for climate data)
- Analysis pipeline (spatial RMSE, population, bathymetry)
- Visualization (temperature maps, clustering, statistical plots)
Create geographic coordinate grids with land/ocean detection.
# Generate 10-degree resolution mesh
python geo_mesh_processor.py 10
# Generate 1-degree high-resolution mesh
python geo_mesh_processor.py 1
# Generate 20-degree coarse mesh
python geo_mesh_processor.py 20meshes/mesh_data_{resolution}deg.json- Mesh data with land pointsmeshes/mesh_data_{resolution}deg.csv- CSV format for analysis
# Plot mesh with land boundaries
python plot_mesh.py meshes/mesh_data_10.0deg.jsonQuery LLMs for temperature data using configuration files. Supports multiple providers.
Edit config.yaml to set up your benchmark parameters:
# Basic benchmark settings
benchmark:
mesh_file: "meshes/mesh_data_10.0deg.json"
num_repeats: 10
simple_mode: true
month: "July"
use_batch: true
disable_tracing: false
resume: false
# Model configuration
model:
provider: "openai" # openai, anthropic, google, ollama
name: "gpt-5-nano"
temperature: 0
max_tokens: 300
max_retries: 3
timeout: 30# Use default config.yaml
python climate_llm_benchmark.py
# Use custom config file
python climate_llm_benchmark.py my_config.yamlexport OPENAI_API_KEY="your-api-key"model:
provider: "openai"
name: "gpt-5-nano" # or gpt-4o, gpt-4o-mini, gpt-3.5-turbopip install langchain-anthropic
export ANTHROPIC_API_KEY="your-api-key"model:
provider: "anthropic"
name: "claude-3-5-sonnet-20241022" # or claude-3-5-haiku-20241022pip install langchain-google-genai
export GOOGLE_API_KEY="your-api-key"model:
provider: "google"
name: "gemini-1.5-pro" # or gemini-1.5-flashpip install langchain-community
ollama serve
ollama pull llama3.1:8bmodel:
provider: "ollama"
name: "llama3.1:8b" # or mistral:7b, qwen2.5:14b# Configure in config.yaml, then run
python climate_llm_benchmark.py# In config.yaml
benchmark:
use_batch: true
disable_tracing: true
num_repeats: 20# In config.yaml
benchmark:
resume: true# For January with Claude
model:
provider: "anthropic"
name: "claude-3-5-haiku-20241022"
benchmark:
month: "January"
# For local Ollama model
model:
provider: "ollama"
name: "mistral:7b"
benchmark:
use_batch: false # Recommended for local modelsresults/climate_results_{resolution}deg_r{repeats}_{model}_simple.json- Final resultsresults/climate_results_intermediate_{n}_{model}_simple.json- Intermediate saves
Process ERA5 NetCDF data to create monthly climatology reference.
# Process ERA5 data to create climatology
python process_era5_climatology.py data/era5_raw_data.ncdata/t2m_climatology_1991-2020.nc- Monthly climatology (1991-2020)
- NetCDF format with 2m temperature (t2m) variable
- Time series covering 1991-2020 period
- Global coverage with regular grid
Create temperature maps from LLM results.
# Create temperature maps from LLM results
python plot_temperature_results.py meshes/mesh_data_10.0deg.json results/climate_results_10.0deg_r10_simple.jsonpng/temperature_map_{resolution}deg_mean.png- Mean temperaturepng/temperature_map_{resolution}deg_series_{n}.png- Individual request seriespng/temperature_map_{resolution}deg_std.png- Standard deviation
Compare LLM predictions against ERA5 climatology with comprehensive analysis.
# Full comparison with maps and statistics
python compare_llm_era5.py meshes/mesh_data_10.0deg.json results/climate_results_10.0deg_r10_simple.json data/t2m_climatology_1991-2020.ncresults/climate_results_{resolution}deg_r{repeats}_simple_era5.json- Combined datapng/llm_era5_comparison_{resolution}deg.png- Scatter plot with error barspng/llm_temperature_map_{resolution}deg.png- LLM temperature mappng/era5_temperature_map_{resolution}deg.png- ERA5 temperature mappng/temperature_difference_map_{resolution}deg.png- Difference map
- Error bars: ERA5 uncertainty (horizontal) + LLM variability (vertical)
- Statistics: RMSE, MAE, bias, correlation, request counts
- Consistent scaling: Both maps use ERA5 temperature range
- Difference visualization: Blue=LLM<ERA5, red=LLM>ERA5
# 1. Generate coarse mesh
python geo_mesh_processor.py 20
# 2. Configure for quick test
# Edit config.yaml:
# mesh_file: "meshes/mesh_data_20.0deg.json"
# num_repeats: 3
# 3. Run benchmark
python climate_llm_benchmark.py
# 4. Compare with ERA5
python compare_llm_era5.py meshes/mesh_data_20.0deg.json results/climate_results_20.0deg_r3_gpt-5-nano_simple.json data/t2m_climatology_1991-2020.nc# 1. Generate fine mesh
python geo_mesh_processor.py 1
# 2. Configure for production
# Edit config.yaml:
# mesh_file: "meshes/mesh_data_1.0deg.json"
# num_repeats: 10
# disable_tracing: true
# resume: false
# 3. Run comprehensive benchmark
python climate_llm_benchmark.py
# 4. If interrupted, enable resume and re-run
# Edit config.yaml: resume: true
python climate_llm_benchmark.py
# 5. Run complete analysis pipeline
python run_complete_analysis_pipeline.py results/climate_results_1.0deg_r10_gpt-5-nano_simple.json# Create configs for different months
for month in January April July October; do
# Edit config.yaml: month: $month
python climate_llm_benchmark.py
python compare_llm_era5.py meshes/mesh_data_10.0deg.json results/climate_results_10.0deg_r10_gpt-5-nano_simple.json data/t2m_climatology_1991-2020.nc
done# 1. Test OpenAI GPT
# config.yaml: provider: "openai", name: "gpt-4o"
python climate_llm_benchmark.py
# 2. Test Anthropic Claude
# config.yaml: provider: "anthropic", name: "claude-3-5-sonnet-20241022"
python climate_llm_benchmark.py
# 3. Test Google Gemini
# config.yaml: provider: "google", name: "gemini-1.5-pro"
python climate_llm_benchmark.py
# 4. Test local Ollama model
# config.yaml: provider: "ollama", name: "llama3.1:8b", use_batch: false
python climate_llm_benchmark.py- Use batch processing (default)
- Disable LangSmith tracing:
disable - Use coarser resolution for testing (20°)
- Enable resume for long runs
- Increase repeats per point (10-20)
- Use higher resolution mesh (1-5°)
- Validate with multiple months/seasons
- Compare different LLM models
# Complete analysis from raw results to all plots
python run_complete_analysis_pipeline.py results/climate_results_1.0deg_r10_simple.jsonPipeline steps:
- Spatial RMSE calculations
- Bathymetry/elevation data integration
- Population density data integration
- Spatial analysis plots
- Temperature comparison plots
- Elevation clustering analysis
- Population clustering plots
- Bathymetry maps and comparisons
- Population maps and correlations
- Filtered spatial analysis (pop≥5/km², elev≤2000m)
# Add neighborhood analysis to existing results
python extend_results_with_spatial_rmse.py results/climate_results_1.0deg_r10_simple.json# Add population density data
python add_population_to_results.py results/climate_results_1.0deg_r10_simple_spatial_rmse.json# 1. Aggregate GEBCO data to 1° grid (one-time setup)
python aggregate_bathymetry.py
# 2. Add elevation parameters
python add_bathymetry_to_results.py results/climate_results_1.0deg_r10_simple_spatial_rmse_population.json# Enhanced spatial analysis with density plots
python plot_spatial_analysis.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Colored comparison plots (elevation, population, roughness)
python plot_temperature_comparison_colored.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Elevation-based clustering (3×3 grid)
python plot_elevation_clusters.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Population-based clustering (3×3 grid)
python plot_population_clusters.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Bathymetry maps and correlations
python plot_bathymetry_map.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Population maps and comparisons
python plot_population_map.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Filtered analysis (populated, low elevation areas)
python plot_spatial_analysis_filtered.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json# Statistical modeling
python multivariate_rmse_analysis.py results/climate_results_1.0deg_r10_simple_spatial_rmse_bathymetry_population.json
# Outputs: distributions, correlations, GAM/XGBoost, spatial CVgeo_benchmark/
├── data/
│ ├── land/ # Natural Earth shapefiles
│ ├── t2m_climatology_*.nc # ERA5 climatology
│ └── bathymetry_1deg_aggregated.nc # GEBCO elevation data
├── meshes/
│ └── mesh_data_*.json # Generated meshes
├── results/
│ ├── climate_results_*.json # Basic LLM results
│ ├── climate_results_*_spatial_rmse.json # + spatial analysis
│ ├── climate_results_*_spatial_rmse_bathymetry.json # + elevation data
│ └── climate_results_*_spatial_rmse_bathymetry_population.json # Complete enhanced data
├── reports/
│ └── multivariate_rmse_report.txt # Statistical analysis
└── png/
└── {results_filename}/ # Organized by results file
├── spatial_analysis_*.png # Spatial maps
├── llm_era5_comparison_*.png # Comparison plots
├── temperature_comparison_*.png # Colored scatter plots
├── elevation_clusters_*.png # Elevation clustering
├── population_clusters_*.png # Population clustering
├── bathymetry_*.png # Elevation/roughness maps
├── population_*.png # Population analysis
├── filtered_*.png # Filtered analysis
└── multivariate_*.png # Statistical plots