Skip to content

NSAPH-Data-Processing/pm25_components_randall_raster2polygon

Repository files navigation

pm25_components__raster2polygon

Code to produce spatial aggregations of PM2.5 component estimates as generated by the Atmospheric Composition Analysis Group. The spatial aggregation are performed for satellite PM2.5 components from grid/raster (NetCDF) to polygons (shp).

This pipeline processes 8 PM2.5 components: NO3, SO4, Sea Salt (SS), NH4, Dust, Black Carbon (BC), Organic Matter (OM), and Organic Matter with H2O (OM_H2O).


Washington University PM2.5 Components

The Atmospheric Composition Analysis Group uses a combination of satellite images, monitors and simulation to generate estimates of PM2.5 and its chemical components. Estimates are stored in NetCDF files and made publicly available. There are several versions of the estimates.

The version V5.NA.05.02 consists of mean PM2.5 component concentrations (μg/m³) available at:

  • Temporal frequency: Annual and monthly
  • Grid resolution: High resolution for North America
  • Geographic region: North America only
  • Components: NO3, SO4, Sea Salt (SS), NH4, Dust, Black Carbon (BC), Organic Matter (OM), and Organic Matter with H2O (OM_H2O)

In this repository, we specifically aggregate the V5.NA.05.02 component files for North America, processing all 8 components simultaneously. The temporal frequency can be modified via configuration parameters.

The file name convention varies by component, for example:

  • NO3: V5NA05.02.HybridNO3-NO3.NorthAmerica.yyyyjjj-yyyyjjj.nc
  • SO4: V5NA05.02.HybridSO4-SO4.NorthAmerica.yyyyjjj-yyyyjjj.nc
  • BC: V5NA05.02.HybridBC-BC.NorthAmerica.yyyyjjj-yyyyjjj.nc

Where yyyy represents the year and jjj represents the Julian day.

References:

Aaron van Donkelaar, Melanie S. Hammer, Liam Bindle, Michael Brauer, Jeffery R. Brook, Michael J. Garay, N. Christina Hsu, Olga V. Kalashnikova, Ralph A. Kahn, Colin Lee, Robert C. Levy, Alexei Lyapustin, Andrew M. Sayer and Randall V. Martin (2021). Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty Environmental Science & Technology, 2021, doi:10.1021/acs.est.1c05309.


Codebook

Dataset Columns:

The output parquet files contain PM2.5 component concentrations aggregated to geographic polygons. Each file includes:

  • Spatial identifier: county or zcta - Geographic unit identifier
  • year: Year of the data
  • month: Month of the data (monthly files only)
  • Component concentrations (μg/m³):
    • no3: Nitrate component
    • so4: Sulfate component
    • ss: Sea salt component
    • nh4: Ammonium component
    • dust: Dust component
    • bc: Black carbon component
    • om: Organic matter component
    • om_h2o: Organic matter with water component

Output files are in Parquet format for efficient storage and processing.


Configuration files

The configuration structure within the /conf folder allows you to modify the input parameters for the following steps:

  • create directory paths: utils/create_dir_paths.py
  • download components: src/download_components.py
  • download shapefiles: src/download_shapefile.py
  • aggregate components: src/aggregate_all_components.py

Key parameters:

  • temporal_freq: Determines whether the original annual (yearly) or monthly data will be aggregated. Options are: yearly and monthly.
  • polygon_name: Determines into which polygons the component grids will be aggregated. Options are: zcta and county.
  • components: List of PM2.5 components to process. Current components: no3, so4, ss, nh4, dust, bc, om, om_h2o.
  • shapefile_year: Years of shapefiles to download for polygon boundaries.

Configuration files:

  • conf/snakemake.yaml: Main pipeline configuration
  • conf/satellite_component/us_components.yaml: Component-specific URLs and file patterns
  • conf/shapefiles/shapefiles.yaml: Shapefile sources and parameters

Run

Conda environment

Clone the repository and create a conda environment.

git clone <https://github.com/<user>/repo>
cd <repo>

conda env create -f requirements.yml
conda activate <env_name> #environment name as found in requirements.yml

It is also possible to use mamba.

mamba env create -f requirements.yml
mamba activate <env_name>

Input and output paths

Run

python utils/create_dir_paths.py 

Pipeline

You can run the pipeline steps manually or run the snakemake pipeline described in the Snakefile.

run pipeline steps manually

python src/download_shapefile.py polygon_name=zcta shapefile_year=2020
python src/download_components.py component=no3 ++temporal_freq=yearly
export PYTHONPATH=.
python src/aggregate_all_components.py polygon_name=zcta ++temporal_freq=yearly ++year=2020

run snakemake pipeline The pipeline processes all configured components simultaneously:

snakemake --cores 4

For SLURM environments, use the provided batch script:

sbatch snakefile.sbatch

Modify the configuration in conf/snakemake.yaml to change polygon_name, temporal_freq, and components as needed.

Dockerized Pipeline

Note: The Docker configuration may need updates to reflect the new component-based pipeline.

Create the folder where you would like to store the output dataset.

mkdir <path>/satellite_pm25_components_raster2polygon

Pull and Run:

docker pull nsaph/satellite_pm25_components_raster2polygon
docker run -v <path>:/app/data/input/pm25_components__randall/yearly <path>/satellite_pm25_components_raster2polygon/:/app/data/output/pm25_components__randall nsaph/satellite_pm25_components_raster2polygon

If you are interested in storing the input raw and intermediate data run

docker run -v <path>/satellite_pm25_components_raster2polygon/:/app/data/ nsaph/satellite_pm25_components_raster2polygon

If you want to build your own image use

docker build -t <image_name> .

For multiplatform use

docker buildx build --platform linux/amd64,linux/arm64 -t <username>/<image_name>:<tag> . --push

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages