Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
d2873ac
Add python scripts to process obsfcstana outputs and to plot statisti…
gmao-qliu Apr 1, 2025
89ea115
fix nc4 datatype error.
gmao-qliu Apr 2, 2025
310f2c3
renamed directory (python_calc_plot_ObsFcstAna -> ObsFcstAna_stats)
gmao-rreichle Apr 7, 2025
0722776
minor update and rename easev2 script
gmao-qliu Apr 7, 2025
b35d19c
move compute_monthly_stats.py to main directory
gmao-qliu Apr 7, 2025
962c114
removed __pycache__/ files
gmao-rreichle Apr 8, 2025
c54ecbb
further reorg file locations
gmao-qliu Apr 8, 2025
c0f9bf8
fix: correct syntax error in compute_monthly_stats.py
amfox37 Apr 8, 2025
d54ab35
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle Apr 10, 2025
737c1eb
update and reorg functions
gmao-qliu Apr 14, 2025
3517295
add multi-experiments options
gmao-qliu Apr 16, 2025
c951034
correct definition of tile_idx
gmao-qliu Apr 17, 2025
6b4a33d
Add functions to write monthly OmF/OmA statistics to NetCDF files
amfox37 May 6, 2025
030259d
Merge branch 'feature/qliu/add_postproc_scripts' of github.com:GEOS-E…
amfox37 May 6, 2025
e72d156
Bugfix for datetime format when reading netCDF file
amfox37 May 7, 2025
e3d2339
bugfix creating/closing subsequent figures
amfox37 May 7, 2025
9fcdb3c
removed CTRL-M (^M) blue carriage return characters (EASEv2_ind2latlo…
gmao-rreichle May 12, 2025
f0e926e
some cleanup of postproc tool to compute ObsFcstAna stats (Main_examp…
gmao-rreichle May 12, 2025
ef52a11
additional cleanup of postproc tool for ObsFcstAna stats
gmao-rreichle May 13, 2025
ea1bfea
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle May 13, 2025
d5af23f
minimal edits (accidentally forgotten to add in previous commit) (wri…
gmao-rreichle May 13, 2025
4f7f84e
add multiple sample scripts and minor update
gmao-qliu May 15, 2025
8dd3504
remove old example script
gmao-qliu May 15, 2025
ad35e67
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle May 19, 2025
6165af8
cleaner separation of user-defined inputs and processing code (Save_m…
gmao-rreichle May 20, 2025
c9ef759
updated CHANGELOG.md
gmao-rreichle May 20, 2025
737e4e8
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle May 23, 2025
8baa1c4
fix minor typo
gmao-qliu May 27, 2025
8e2f204
fix minor missing info.
gmao-qliu May 27, 2025
3428f20
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle May 27, 2025
c2347f0
removed "executable" permissions from py scripts
gmao-rreichle May 27, 2025
9adeacd
removed obsolete functions from ObsFcstAna_stats/helper/write_nc4.py
gmao-rreichle May 27, 2025
83bb7e0
added documentation, fixed indent (GEOSldas_App/util/shared/python/pl…
gmao-rreichle May 27, 2025
84da80d
separate user definition, reorg functions, and add some comments
gmao-qliu May 28, 2025
1419800
remove unused imports
gmao-qliu May 29, 2025
18d67a8
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle May 29, 2025
d529477
changed variable name for clarity; white-space changes (tile2grid.py)
gmao-rreichle May 29, 2025
80289bf
renaming of function and variables for clarification (tile2grid.py)
gmao-rreichle May 29, 2025
454a99c
renamed function name to avoid confusion with GEOS "tile2grid" operat…
gmao-rreichle May 29, 2025
16f5ce2
rearranged order of inputs and edited comments for clariy (user_confi…
gmao-rreichle May 29, 2025
a05bd01
added/edited comments; white-space changes for better alignment (read…
gmao-rreichle May 29, 2025
55faa33
white space changes for better alignment (write_nc4.py)
gmao-rreichle May 29, 2025
39d1aac
cleaned up time variables in compute_monthly_sums() and save_monthly_…
gmao-rreichle May 29, 2025
7984912
more cleanup of time variables; edits of comments; white-space change…
gmao-rreichle May 29, 2025
7ae212a
edited comments (Save_monthlysums.py)
gmao-rreichle May 29, 2025
13da375
remove 'obs_from' and add 'use_obs'
gmao-qliu May 30, 2025
f02efb4
add print to verfiy that obs species match across exp.
gmao-qliu May 30, 2025
e1bb6c9
additional edits to 'use_obs' / 'obs_from' logic and comments (postpr…
gmao-rreichle Jun 2, 2025
d1852a8
added check to verify that obs species match across experiments (user…
gmao-rreichle Jun 2, 2025
fecb7b7
changed name of "ObsFcstAna_sums" files; edited comments (postproc_Ob…
gmao-rreichle Jun 2, 2025
5aefcfd
cleaned up documentation and changed file name of sample scripts (Get…
gmao-rreichle Jun 2, 2025
fe2b15a
cleaned up documentation of sample scripts (Get_ObsFcstAna_stats.py, …
gmao-rreichle Jun 2, 2025
ff5ff47
fix variable name error
gmao-qliu Jun 3, 2025
a28d129
add comments regard OmF_norm
gmao-qliu Jun 3, 2025
227a110
remove unavaialble variable from filename
gmao-qliu Jun 3, 2025
84c3b58
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle Jun 6, 2025
4b65032
clarified comments about stats of normalized OmFs (Plot_stats_maps.py)
gmao-rreichle Jun 8, 2025
3acc2a9
additional tweak to comments on normalized OmF stats (Plots_stats_map…
gmao-rreichle Jun 8, 2025
f2427fb
cleaned up obsolete if block (code in if and else block was identical…
gmao-rreichle Jun 8, 2025
5d42392
additional cleanup of stats_file name (Get_ObsFcstAna_stats.py, Plot_…
gmao-rreichle Jun 8, 2025
42591bc
shorten 'sums' filename and add exp. config verification
gmao-qliu Jun 10, 2025
57a2303
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle Jun 10, 2025
615ca1a
Merge branch 'develop' into feature/qliu/add_postproc_scripts
gmao-rreichle Jun 11, 2025
990bce2
cleaned up exptag[_list] and outid; added/clarified comments in ObsFc…
gmao-rreichle Jun 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Added python package for post-processing ObsFcstAna output.

### Changed

- Specify only ntasks_model for SLURM resource request.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#!/usr/bin/env python3

"""
Sample script for post-processing GEOSldas ObsFcstAna output into data assimilation diagnostics.
First, compute and store monthly sums and sums of squares and cross-products of raw ObsFcstAna output.
Data assimilation diagnostics ("stats") such as the mean and std-dev of the observation-minus-forecast
residuals can then be diagnosed quickly from these intermediate "sums" files.
Sample script optionally computes and plots:
- Maps of long-term data assimilation diagnostics (see also Plot_stats_maps.py).
- Monthly time series of spatially averaged data assimilation diagnostics (see also Plot_stats_timeseries.py).

Usage:
1. Edit "user_config.py" with experiments information.
2. Run this script as follows (on Discover):
$ module load python/GEOSpyD
$ ./Get_ObsFcstAna_stats.py

# Background execution:
$ nohup ./Get_ObsFcstAna_stats.py > out.log &

Authors: Q. Liu, R. Reichle, A. Fox
Last Modified: May 2025
"""

import sys; sys.path.append('../../shared/python/')
import warnings; warnings.filterwarnings("ignore")
import os

import numpy as np

from datetime import timedelta
from postproc_ObsFcstAna import postproc_ObsFcstAna
from user_config import get_config # user-defined config info

# ---
#
# If the script is run in the background, uncomment the following lines to see the redirected
# standard output in the out.log file immediately. When the lines are commented out, the redirected
# standard output will not appear in the out.log file until the job has completed.
# If the script is run in the foreground, the lines must be commented out.
#
#import io
#sys.stdout = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
#sys.stderr = io.TextIOWrapper(open(sys.stderr.fileno(), 'wb', 0), write_through=True)
#
# ---

def main():

config = get_config() # edit user-defined inputs in user_config.py

exp_list = config['exp_list']
start_time = config['start_time']
end_time = config['end_time']
sum_path = config['sum_path']
out_path = config['out_path']

# ------------------------------------------------------------------------------------
# Postprocess raw ObsFcstAna output data into monthly sums

# Initialize the postprocessing object
postproc = postproc_ObsFcstAna(exp_list, start_time, end_time, sum_path=sum_path)

# Compute and save monthly sums
postproc.save_monthly_sums()

# --------------------------------------------------------------------------------------
# Optionally compute long-term temporal/spatial statistics and create plots.
# The plotting scripts can also run standalone using the individual Plot_stats_*.py scripts,
# as long as the monthly sum files are available.

plot_maps = False
plot_timeseries = False

if plot_maps: # Compute long-term temporal stats and plot maps

stats_file = out_path + 'temporal_stats_'+exp_list[0]['exptag']+ '_'+start_time.strftime('%Y%m%d')+'_'+ \
(end_time+timedelta(days=-1)).strftime('%Y%m%d')+'.nc4'

# temporal_stats is a dictionary that contains all mean/variances fields for computing long-term O-F/O-A stats
# each field has the dimension [N_tile, N_species]

temporal_stats = postproc.calc_temporal_stats_from_sums(write_to_nc=True, fout_stats=stats_file)

# Example to plot some O-F maps
from Plot_stats_maps import plot_OmF_maps
plot_OmF_maps(postproc, temporal_stats, fig_path=out_path )

if plot_timeseries: # Compute spatially averaged stats and plot monthly time series of stats

from Plot_stats_timeseries import Plot_monthly_OmF_bars
Plot_monthly_OmF_bars(postproc, fig_path=out_path)

if __name__ == '__main__':
main()

# ====================== EOF =========================================================
201 changes: 201 additions & 0 deletions GEOSldas_App/util/postproc/ObsFcstAna_stats/Plot_stats_maps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
#!/usr/bin/env python3

"""
Sample script for plotting maps of long-term data assimilation diagnostics.
Computes Nobs-weighted avg of each metric across all species.
Requires saved files with monthly sums (see Get_ObsFcstAna_stat.py).
Stats of *normalized* OmFs are approximated!
"""

import sys; sys.path.append('../../shared/python/')
import warnings; warnings.filterwarnings("ignore")

import numpy as np
import matplotlib.pyplot as plt

from datetime import timedelta

from remap_1d_to_2d import remap_1d_to_2d
from plot import plotMap
from EASEv2 import EASEv2_ind2latlon

def plot_OmF_maps(postproc_obj, stats, fig_path='./'):

start_time = postproc_obj.start_time
end_time = postproc_obj.end_time
domain = postproc_obj.domain
tc = postproc_obj.tilecoord

# Sample of final compuation of selected diagnostic metrics

Nmin = 20

# Then compute metrics of O-F, O-A, etc. based on above computed
N_data = stats['N_data']
O_mean = stats['obs_mean']
# mean(x-y) = E[x] - E[y]
OmF_mean = stats['obs_mean'] - stats['fcst_mean']
OmA_mean = stats['obs_mean'] - stats['ana_mean']
# var(x-y) = var(x) + var(y) - 2cov(x,y)
# cov(x,y) = E[xy] - E[x]E[y]
OmF_stdv = np.sqrt(stats['obs_variance'] + stats['fcst_variance'] - \
2 * (stats['oxf_mean'] - stats['obs_mean']*stats['fcst_mean']))

OmA_stdv = np.sqrt(stats['obs_variance'] + stats['ana_variance'] - \
2 * (stats['oxa_mean'] - stats['obs_mean']*stats['ana_mean']))

# *****************************************************************************************
# The time series mean and std-dev of the *normalized* OmF computed here are APPROXIMATED!
# *****************************************************************************************
# Here, we first compute the stats of the OmF time series and then normalize using
# the time-avg "obsvar" and "fcstvar" values.
# Since "fcstvar" changes with time, the OmF values should be normalized at each time
# step (as in the older matlab scripts), and then the time series stats can be computed.
# To compute the exact stats with this python package, the sum and sum-of-squares of
# the normalized OmF values would need to be added into the sums files.
#
OmF_norm_mean = OmF_mean / np.sqrt(stats['obsvar_mean'] + stats['fcstvar_mean']) # APPROXIMATED stat!
OmF_norm_stdv = np.sqrt(OmF_stdv**2 / (stats['obsvar_mean'] + stats['fcstvar_mean']) ) # APPROXIMATED stat!

# Mask out data points with insufficent observations using the Nmin threshold
# Do NOT apply to N_data
OmF_mean[ N_data < Nmin] = np.nan
OmF_stdv[ N_data < Nmin] = np.nan
OmF_norm_mean[N_data < Nmin] = np.nan
OmF_norm_stdv[N_data < Nmin] = np.nan
OmA_mean[ N_data < Nmin] = np.nan
OmA_stdv[ N_data < Nmin] = np.nan
N_data[ N_data < Nmin] = 0

# Compute Nobs-weighted avg of each metric across all species.
# Typically used for SMAP Tb_h/h from asc and desc overpasses,
# or ASCAT soil moisture from Metop-A/B/C.
# DOES NOT MAKE SENSE IF, SAY, SPECIES HAVE DIFFERENT UNITS!
Nobs_data = np.nansum( N_data, axis=1)
OmF_mean = np.nansum(OmF_mean *N_data, axis=1)/Nobs_data
OmF_stdv = np.nansum(OmF_stdv *N_data, axis=1)/Nobs_data
OmF_norm_mean = np.nansum(OmF_norm_mean*N_data, axis=1)/Nobs_data
OmF_norm_stdv = np.nansum(OmF_norm_stdv*N_data, axis=1)/Nobs_data
OmA_mean = np.nansum(OmA_mean *N_data, axis=1)/Nobs_data
OmA_stdv = np.nansum(OmA_stdv *N_data, axis=1)/Nobs_data

# Plotting
exptag = postproc_obj.exptag

fig, axes = plt.subplots(2,2, figsize=(18,10))
plt.rcParams.update({'font.size':14})

for i in np.arange(2):
for j in np.arange(2):
units = '[k]'
if i == 0 and j == 0:
tile_data = Nobs_data
# crange is [cmin, cmax]
crange =[0, np.ceil((end_time-start_time).days/150)*300]
colormap = plt.get_cmap('jet',20)
title_txt = exptag + ' Tb Nobs '+ start_time.strftime('%Y%m')+'_'+end_time.strftime('%Y%m')
units = '[-]'
if i == 0 and j ==1:
tile_data = OmF_mean
crange =[-3, 3]
colormap = plt.get_cmap('bwr', 15)
title_txt = exptag + ' Tb O-F mean '+ start_time.strftime('%Y%m')+'_'+end_time.strftime('%Y%m')
if i == 1 and j == 0:
tile_data = OmF_stdv
crange =[0, 15]
colormap = plt.get_cmap ('jet',15)
title_txt = exptag + ' Tb O-F stdv '+ start_time.strftime('%Y%m')+'_'+end_time.strftime('%Y%m')
if i == 1 and j == 1:
tile_data = OmF_norm_stdv
crange =[0, 15]
colormap = plt.get_cmap ('jet',15)
title_txt = exptag + ' Tb normalized O-F stdv (approx!) '+ start_time.strftime('%Y%m%d')+'_'+end_time.strftime('%Y%m%d')

colormap.set_bad(color='0.9') # light grey, 0-black, 1-white

# Regrid 1d tile_data to 2d grid_data for map plots
if '_M09_' in domain: # special case
grid_data_M09 = np.zeros((1624, 3856)) + np.nan
grid_data_M09[tc['j_indg'],tc['i_indg']] = tile_data

# Reshape the data into 4x4 blocks
reshaped = grid_data_M09.reshape(1624//4, 4, 3856//4, 4)

# Combine each 4x4 M09 block into a M36 grid
#if i==0 and j==0:
# grid_data = np.sum(reshaped,axis=(1, 3))
#else:
# grid_data = np.nanmean(reshaped,axis=(1, 3))

grid_data = grid_data_M09[1::4, 2::4]

# NOT area weighted
wmean = np.nanmean(grid_data)
wabsmean = np.nanmean(np.abs(grid_data))
if 'normalized' in title_txt:
wabsmean = np.nanmean(np.abs(grid_data-1.))

lat_M36, lon_M36 = EASEv2_ind2latlon(np.arange(406), np.arange(964),'M36')
lon_2d,lat_2d = np.meshgrid(lon_M36,lat_M36)
else:
grid_data, uy, ux = remap_1d_to_2d(tile_data, lat_1d = tc['com_lat'], lon_1d = tc['com_lon'])
lon_2d,lat_2d = np.meshgrid(ux, uy)

# Aear weighted mean and mean(abs)
wmean = np.nansum( tile_data * tc['area'])/np.nansum(~np.isnan(tile_data)*tc['area'])
wabsmean = np.nansum(np.abs(tile_data) * tc['area'])/np.nansum(~np.isnan(tile_data)*tc['area'])
if 'normalized' in title_txt:
wabsmean = np.nansum(np.abs(tile_data-1.) * tc['area'])/np.nansum(~np.isnan(tile_data)*tc['area'])

if 'normalized' in title_txt:
title_txt = title_txt + '\n' + "avg=%.3f, avg(abs(nstdv-1))=%.3f" % (wmean, wabsmean)+' '+units
elif 'mean' in title_txt:
title_txt = title_txt + '\n' + "avg=%.3f, avg(abs)=%.3f" % (wmean, wabsmean)+' '+units
else:
title_txt = title_txt + '\n' + "avg=%.2f" % (wmean) +' '+units

if 'normalized' in title_txt:
grid_data = np.log10(grid_data)
crange = [-0.6, 0.45]

mm, cs = plotMap(grid_data, ax =axes[i,j], lat=lat_2d, lon=lon_2d, cRange=crange, \
title=title_txt, cmap=colormap, bounding=[-60, 80, -180,180])

plt.tight_layout()
# Save figure to file
fig.savefig(fig_path+'Map_OmF_'+ exptag +'_'+start_time.strftime('%Y%m')+'_'+\
(end_time+timedelta(days=-1)).strftime('%Y%m')+'.png')
#plt.show()
plt.close(fig)

# -----------------------------------------------------------------------------------------------

if __name__ == '__main__':

from postproc_ObsFcstAna import postproc_ObsFcstAna
from user_config import get_config

config = get_config() # edit user-defined inputs in user_config.py

exp_list = config['exp_list']
start_time = config['start_time']
end_time = config['end_time']
sum_path = config['sum_path']
out_path = config['out_path']

# ------------------------------------------------------------------------------------
# Initialize the postprocessing object
postproc = postproc_ObsFcstAna(exp_list, start_time, end_time, sum_path=sum_path)

# Compute long-term temporal stats and plot maps
stats_file = out_path + 'temporal_stats_'+exp_list[0]['exptag']+'_'+ start_time.strftime('%Y%m%d')+'_'+ \
(end_time+timedelta(days=-1)).strftime('%Y%m%d')+'.nc4'

# temporal_stats is a dictionary that contains all mean/variances fields for computing long-term O-F/O-A stats
# each field has the dimension [N_tile, N_species]

temporal_stats = postproc.calc_temporal_stats_from_sums(write_to_nc=True, fout_stats=stats_file)

plot_OmF_maps(postproc, temporal_stats, fig_path=out_path )

# ====================== EOF =========================================================
Loading