pm25_components__raster2polygon

Code to produce spatial aggregations of PM2.5 component estimates as generated by the Atmospheric Composition Analysis Group. The spatial aggregation are performed for satellite PM2.5 components from grid/raster (NetCDF) to polygons (shp).

This pipeline processes 8 PM2.5 components: NO3, SO4, Sea Salt (SS), NH4, Dust, Black Carbon (BC), Organic Matter (OM), and Organic Matter with H2O (OM_H2O).

Washington University PM2.5 Components

The Atmospheric Composition Analysis Group uses a combination of satellite images, monitors and simulation to generate estimates of PM2.5 and its chemical components. Estimates are stored in NetCDF files and made publicly available. There are several versions of the estimates.

The version V5.NA.05.02 consists of mean PM2.5 component concentrations (μg/m³) available at:

Temporal frequency: Annual and monthly
Grid resolution: High resolution for North America
Geographic region: North America only
Components: NO3, SO4, Sea Salt (SS), NH4, Dust, Black Carbon (BC), Organic Matter (OM), and Organic Matter with H2O (OM_H2O)

In this repository, we specifically aggregate the V5.NA.05.02 component files for North America, processing all 8 components simultaneously. The temporal frequency can be modified via configuration parameters.

The file name convention varies by component, for example:

NO3: V5NA05.02.HybridNO3-NO3.NorthAmerica.yyyyjjj-yyyyjjj.nc
SO4: V5NA05.02.HybridSO4-SO4.NorthAmerica.yyyyjjj-yyyyjjj.nc
BC: V5NA05.02.HybridBC-BC.NorthAmerica.yyyyjjj-yyyyjjj.nc

Where yyyy represents the year and jjj represents the Julian day.

References:

Aaron van Donkelaar, Melanie S. Hammer, Liam Bindle, Michael Brauer, Jeffery R. Brook, Michael J. Garay, N. Christina Hsu, Olga V. Kalashnikova, Ralph A. Kahn, Colin Lee, Robert C. Levy, Alexei Lyapustin, Andrew M. Sayer and Randall V. Martin (2021). Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty Environmental Science & Technology, 2021, doi:10.1021/acs.est.1c05309.

Codebook

Dataset Columns:

The output parquet files contain PM2.5 component concentrations aggregated to geographic polygons. Each file includes:

Spatial identifier: county or zcta - Geographic unit identifier
year: Year of the data
month: Month of the data (monthly files only)
Component concentrations (μg/m³):
- no3: Nitrate component
- so4: Sulfate component
- ss: Sea salt component
- nh4: Ammonium component
- dust: Dust component
- bc: Black carbon component
- om: Organic matter component
- om_h2o: Organic matter with water component

Output files are in Parquet format for efficient storage and processing.

Configuration files

The configuration structure within the /conf folder allows you to modify the input parameters for the following steps:

create directory paths: utils/create_dir_paths.py
download components: src/download_components.py
download shapefiles: src/download_shapefile.py
aggregate components: src/aggregate_all_components.py

Key parameters:

temporal_freq: Determines whether the original annual (yearly) or monthly data will be aggregated. Options are: yearly and monthly.
polygon_name: Determines into which polygons the component grids will be aggregated. Options are: zcta and county.
components: List of PM2.5 components to process. Current components: no3, so4, ss, nh4, dust, bc, om, om_h2o.
shapefile_year: Years of shapefiles to download for polygon boundaries.

Configuration files:

conf/snakemake.yaml: Main pipeline configuration
conf/satellite_component/us_components.yaml: Component-specific URLs and file patterns
conf/shapefiles/shapefiles.yaml: Shapefile sources and parameters

Run

Conda environment

Clone the repository and create a conda environment.

git clone <https://github.com/<user>/repo>
cd <repo>

conda env create -f requirements.yml
conda activate <env_name> #environment name as found in requirements.yml

It is also possible to use mamba.

mamba env create -f requirements.yml
mamba activate <env_name>

Input and output paths

Run

python utils/create_dir_paths.py

Pipeline

You can run the pipeline steps manually or run the snakemake pipeline described in the Snakefile.

run pipeline steps manually

python src/download_shapefile.py polygon_name=zcta shapefile_year=2020
python src/download_components.py component=no3 ++temporal_freq=yearly
export PYTHONPATH=.
python src/aggregate_all_components.py polygon_name=zcta ++temporal_freq=yearly ++year=2020

run snakemake pipeline The pipeline processes all configured components simultaneously:

snakemake --cores 4

For SLURM environments, use the provided batch script:

sbatch snakefile.sbatch

Modify the configuration in conf/snakemake.yaml to change polygon_name, temporal_freq, and components as needed.

Dockerized Pipeline

Note: The Docker configuration may need updates to reflect the new component-based pipeline.

Create the folder where you would like to store the output dataset.

mkdir <path>/satellite_pm25_components_raster2polygon

Pull and Run:

docker pull nsaph/satellite_pm25_components_raster2polygon
docker run -v <path>:/app/data/input/pm25_components__randall/yearly <path>/satellite_pm25_components_raster2polygon/:/app/data/output/pm25_components__randall nsaph/satellite_pm25_components_raster2polygon

If you are interested in storing the input raw and intermediate data run

docker run -v <path>/satellite_pm25_components_raster2polygon/:/app/data/ nsaph/satellite_pm25_components_raster2polygon

If you want to build your own image use

docker build -t <image_name> .

For multiplatform use

docker buildx build --platform linux/amd64,linux/arm64 -t <username>/<image_name>:<tag> . --push

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
conf		conf
data		data
fasrc_jobs		fasrc_jobs
notes		notes
src		src
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
environment.yaml		environment.yaml
requirements.txt		requirements.txt
snakefile.sbatch		snakefile.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pm25_components__raster2polygon

Washington University PM2.5 Components

References:

Codebook

Dataset Columns:

Configuration files

Key parameters:

Configuration files:

Run

Conda environment

Input and output paths

Pipeline

Dockerized Pipeline

Pull and Run:

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

NSAPH-Data-Processing/pm25_components_randall_raster2polygon

Folders and files

Latest commit

History

Repository files navigation

pm25_components__raster2polygon

Washington University PM2.5 Components

References:

Codebook

Dataset Columns:

Configuration files

Key parameters:

Configuration files:

Run

Conda environment

Input and output paths

Pipeline

Dockerized Pipeline

Pull and Run:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages