Skip to content

Commit 501125c

Browse files
authored
Update to two-node per population metric structure.
Updates for population level metrics
2 parents 2372274 + 5c99839 commit 501125c

File tree

7 files changed

+413
-584
lines changed

7 files changed

+413
-584
lines changed

README.md

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Each config entry will match this format, with the available options:
4545
~~~
4646
{user_provided_name:
4747
{
48-
'method': ['random', 'seasonal', 'age'],
48+
'method': ['random', 'seasonal'],
4949
'n_samples_year': Int,
5050
'replicates': 2,
5151
'method_params': {
@@ -55,7 +55,7 @@ Each config entry will match this format, with the available options:
5555
}
5656
~~~
5757

58-
There are three broad method options for sampling:
58+
There are two broad method options for temporal sampling:
5959
1) 'random' - will sample N infections per year, tends to match seasonality cases. Can be further directed with the following 'method_params' options:
6060
- 'population_proportion': list, N populations. Used to sample from the source or sink only, equally, etc. Within population comparisons of genetic metrics can be specified below. Confirm the total number of samples per year * proportion reflects the minimum numbers of infections desired per population.
6161
- 'monogenomic_proportion': False or float for true (< 1). Will bias the sampling to include fewer or more monogenomic infections than may be the Bool modeled proportion. Used to compare the effect of metrics derived from monogenomic (e.g. unique proportion) or polygenomic samples (e.g. co-transmission proportion, Rh)
@@ -64,10 +64,6 @@ There are three broad method options for sampling:
6464
2) 'seasonal': Will sample N infections per year, each in the wet or the dry season to compare temporal sampling effects. Currently, the model is set-up for the consistent Sahelian seasonality, must update for other seasonality simulation scenarios. If an intervention start time is provided, this sampling frame is unaffected - the simulation years and months are used to make sure sequential seasonal groupings. Can be further refined with following 'method_params' options:
6565
- 'season': 'full' for all months in the wet or dry season or 'peak' for the 3 highest and lowest case months. Months for sampling are hardcoded for both full and peak season options.
6666

67-
3) 'age': Will sample N infections per year, but will direct which age individuals are presented most in the population regardless of age distribution specified in the model. Use for comparing sampling schemes based on age, e.g. mirror biased sampling such as school surveys. Can be further customized with following 'method_params' options:
68-
- age_bins: List with the upper bound of each group. Default: [5, 15, 100]
69-
- age_bin_labels: List containing names for each age grouping. Default: ['0-5yrs', '5-15yrs', '15+yrs']
70-
7167

7268
### Subpopulation comparisons
7369

@@ -76,7 +72,6 @@ Above options will calculate metrics for all samples in a population for each sa
7672
The subpopulation options supported include:
7773
- 'add_monthly': Provide summary statistics by month for all infections. Excludes IBx and Rh relatedness calculations to reduce computational time and memory and real data calculations are not computed at this scale. This default can be changed in the run_time_summaries
7874
function in unified_metric_calculations by using the complete nested dictionary instead of the nested dictionary ignoring the monthly groupings on infections. (May require further testing and debugging.)
79-
- 'populations': Defined by the population node in EMOD
8075
- 'polygenomic': Is polygenomic = 1, else monogenomic = 0
8176
- 'symptomatic': Is symptomatic = 1, else asymptomatic = 0
8277
- 'age_bins': Default age bins: 0-5, 5-15, 15+
@@ -121,9 +116,10 @@ This section defines with genetic metrics will be calculated for each set as boo
121116
``{sim_id}_FPG_ModelSummaries.csv``: File containing the genetic metrics across columns and the years, season, and subpopulation comparisons as columns. Addition of summary statistic columns can vary based on user options for metric calculations.
122117

123118
- 'sampling_scheme': Grouping variable for the sampling scheme applied (matches 'sampling' options in config).
124-
- 'comparison_type': Identifies with sampling scheme groupings, such as yearly or seasonal groups, or specified subpopulations (matches 'subpopulation_comparison' options in config).
125-
- 'year_group': Specifies the year (either simulation year or intervention shifted year) or the seasonal grouping bin for summary statistics.
126-
- 'sub_group': Sepcifies the additional groupings within subpopulations, e.g. whether True/False for polygenomic or symptomatic.
119+
- 'time_group': The time window used for grouping, either 'group_month', 'group_year', or 'group_season'.
120+
- 'time_value': Specifies the year (either simulation year or intervention shifted year), seasonal grouping bin, or month for summary statistics.
121+
- 'comparison_type': Identifies with sampling scheme groupings, such as 'all' infections in a time period, by subpopulations such as 'polygenomic' or 'symptomatic'.
122+
- 'comparison_group': The specific group identified for 'comparison_type', e.g. whether True/False for polygenomic or symptomatic.
127123
- 'n_infections': Counts for the number of infections in each sampling scheme, for each year and subpopulation grouping specified in the observational model run. These are the actual number of infections that were available in the report by grouping and may be lower than the specified targets.
128124
- '{true/effective/genotype}_poly_coi_count': The number of infections per grouping that have a COI > 2. True is the modeled number of genomes tracked, which effective is the number of unique and detectable genomes in an infection by ancestry, and genome is the number of unique detectable genomes in an infection by bi-allelic representation.
129125
- '{true/effective/genotype_poly_coi_prop}': The proportion of infections per grouping that have a COI > 2. Calculated as '{true/effective/genotype}_poly_coi_count'/n_infections.
@@ -144,21 +140,22 @@ This section defines with genetic metrics will be calculated for each set as boo
144140

145141

146142

147-
``{sim_id}-{ibd/ibs}_distributions.json``: To avoid large pairwise matrices s output, to further investigate population level IBs distributions one could use the JSON file with the IBx calculated value as the key up to two decimal places and the number of pairwise counts as a the value. It matches the output CSV in matching sampling, comparison_types, and subpopulations.
143+
``{sim_id}-{ibd/ibs}_distributions.json``: To avoid large pairwise matrices s output, to further investigate population level IBs distributions one could use the JSON file with the IBx calculated value as the key up to two decimal places and the number of pairwise counts as a the value. It matches the output CSV in matching sampling, comparison_types and groups by population.
148144

149145
~~~
150-
{
146+
{"population_N": {
151147
"user_specified_name": { # "sampling_scheme
152-
"population": { # comparision_type Like group_year, season_bin, population polygenomic, etc.
148+
"symptomatic": { # comparision_type Like group_year, season_bin, population polygenomic, etc.
153149
"(2, 0)": { # For subpopulations, the key here can be tuple, with the first item is the group_year, and the second is the group identifier. For example, this is for year 2, population 0
154150
"0.5": 54,
155151
"0.7": 41,
156152
"0.9": 7,
157153
"1": 4
158-
}
154+
}
159155
160-
}
161-
}
156+
}
157+
}
158+
}
162159
}
163160
~~~
164161

@@ -170,12 +167,28 @@ In the the absence of the mapping file, one can look for the directories belongi
170167

171168
~~~
172169
# Example pull of data
173-
EXPERIMENT_NAME="/mnt/calculon2/jsuresh/output/maka fpg 10k - 6yr - strong ITNs in_20250522_195001/"
170+
EXPERIMENT_NAME="/mnt/calculon2/{user}/output/{emod_experiment_id}"
174171
OUTPUT_FILE="experiment_mapping.csv"
175172
176173
{ echo "output_name,input_dir"; find "$EXPERIMENT_NAME" -name "output" -type d | sed 's|.*/\([0-9a-f]\{8\}-[0-9a-f]\{4\}-[0-9a-f]\{4\}-[0-9a-f]\{4\}-[0-9a-f]\{12\}\)/output$|\1,"\0"|'; } > "$OUTPUT_FILE"
177174
~~~
178175

176+
## IDM Developer Notes
177+
178+
If updating the repository to run the observational model on COMPs, these are the steps to set up the Singularity image.
179+
180+
1) Update the following files with a new `1.0.0..dev{n+1}` version number:
181+
- docker/Dockerfile: Line 52
182+
- docker/Singularity: Line 61
183+
- pyproject.toml: Line 7
184+
185+
2) After committing/merging to EMOD-Hub branch without errors, click on Actions -> Promote package to production and match the new version name when prompted.
186+
187+
3) Actions -> Build and push similarity image. Keep all the same information form COMPs or specify file locations as needed.
188+
189+
4) Pass the docker/ObsModel_rocky.id to emodpy-malaria files to run with new simulations.
190+
191+
179192
# Disclaimer
180193
The code in this repository was developed by IDM and other collaborators to support our joint research on flexible agent-based modeling.
181194
We've made it publicly available under the MIT License to provide others with a better understanding of our research and an opportunity to build upon it for

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ RUN pip3 install --no-cache-dir "python-snappy==0.6.1"
4949

5050
# Install the ObsModel packages
5151
# RUN bash -c "pip3 install -r /tmp/requirements.txt"
52-
RUN pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev5" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
52+
RUN pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev6" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
5353

5454
# Install the emod_api package
5555
RUN pip3 install --no-cache-dir "emod-api>=1.33.7,<2" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple

docker/Singularity.def

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ pip3 install --no-cache-dir "python-snappy==0.6.1"
5858

5959
# Install the ObsModel packages
6060
# RUN bash -c "pip3 install -r /tmp/requirements.txt"
61-
pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev5" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
61+
pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev6" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
6262

6363
# Install the emod_api package
6464
pip3 install --no-cache-dir "emod-api>=1.33.7,<2" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple

fpg_observational_model/run_observational_model.py

Lines changed: 10 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -44,13 +44,7 @@ def get_default_config():
4444
# 'method_params': {
4545
# 'season': 'full', # Options: full or peak; currently hardcoded to match Senegal's seasonality; update for other scenarios in unified_sampling.py
4646
# }
47-
# },
48-
# 'age': { # Example of how to set-up a sampling scheme based on age, to mirror biased sampling such as school surveys and health facility comparisons.
49-
# 'method': 'age',
50-
# 'n_samples_year': 15,
51-
# 'replicates': 1
52-
# }
53-
47+
# }
5448
},
5549
'metrics': {
5650
'cotransmission_proportion': True,
@@ -65,8 +59,7 @@ def get_default_config():
6559
'unique_genome_proportion': True # Will calculate both the proportion of unique genomes in the sampled infections to replicate phasing and from monogenomic samples with an effective COI of 1 only to match barcode limits.
6660
},
6761
'subpopulation_comparisons': { # Supported for yearly and seasonal temporal sampling schemes, not age-based sampling.
68-
'add_monthly': False, # Whether to add monthly comparisons within each year
69-
'populations': False, # Defined by the population node in EMOD
62+
'add_monthly': False, # Whether to add monthly comparisons in addition to yearly comparisons for temporal sampling schemes
7063
'polygenomic': True, # Is polygenomic = 1, else monogenomic = 0
7164
'symptomatic': False, # Is symptomatic = 1, else asymptomatic = 0
7265
'age_bins': False # Default age bins: 0-5, 5-15, 15+
@@ -385,7 +378,8 @@ def deep_merge(default_dict, override_dict):
385378
print(f"Error: {infection_df_path} not found. Loading test data.")
386379
infection_df = pd.read_csv('test_data/test_fpg_infections.csv')
387380

388-
# Run sampling model
381+
# Run sampling model
382+
print(f"Config paramters for sampling model:\n {config}")
389383
sample_df = run_sampling_model(
390384
input_df=infection_df,
391385
config=config,
@@ -405,7 +399,7 @@ def deep_merge(default_dict, override_dict):
405399
ibs_matrix = None
406400
# Optional - included if need to filter out non-variant tracked sites, i.e. immunity markers or drugR for calculating genetic metrics only on neutral variant sites.
407401
variant_indices = None
408-
#variant_indices = [0, 3, 5, 6, 8, 9, 11, 12, 14, 16, 17, 19, 20, 22, 24, 25, 27, 28, 30, 32, 33, 35, 37, 38, 40, 42, 43, 45, 47, 48, 50, 52, 53, 55, 57, 58, 60, 62, 63, 65, 67, 68, 69, 71, 73, 75, 77, 78, 79, 80, 82, 84, 86, 88, 89, 90, 91, 93, 94, 96, 98, 99, 101, 102, 103, 104, 106, 107, 109, 110, 112, 113, 115, 116, 117, 118, 119, 121, 122, 124, 125, 127, 128, 130, 131, 132, 133, 134, 135, 137, 138, 139, 141, 142, 144, 145, 146, 148, 149, 150]
402+
# variant_indices = []
409403

410404
if config['metrics']['identity_by_descent']:
411405
user_specified_ibx.append('ibd')
@@ -476,53 +470,8 @@ def deep_merge(default_dict, override_dict):
476470

477471

478472
#####################################################################################
479-
# Parallelizable wrapper function
473+
# Single run test
480474
#####################################################################################
481-
def process_file(file_row, output_summary_dir, config_path=None, verbose=False):
482-
"""
483-
Process a single file for parallel execution.
484-
485-
Parameters:
486-
file_row: pandas Series or dict with 'output_name' and 'input_dir' columns
487-
output_summary_dir: Directory to save outputs
488-
config_path: Path to config file (optional)
489-
verbose: Whether to print verbose output
490-
491-
Returns:
492-
str: Name of processed simulation
493-
"""
494-
try:
495-
# Extract information from the row
496-
sim_name = file_row['output_name']
497-
emod_output_path = file_row['input_dir']
498-
499-
# Use default config if not specified
500-
if config_path is None or not os.path.exists(config_path):
501-
config_path = "" # Will trigger default config usage
502-
503-
# Create output directory for this simulation
504-
output_path = os.path.join(output_summary_dir, sim_name)
505-
506-
# Run the observational model
507-
result = run_observational_model(
508-
sim_name=sim_name,
509-
emod_output_path=emod_output_path,
510-
config_path=config_path,
511-
output_path=output_path,
512-
verbose=verbose
513-
)
514-
515-
return f"SUCCESS: {sim_name}"
516-
517-
except Exception as e:
518-
error_msg = f"ERROR processing {file_row.get('output_name', 'unknown')}: {str(e)}"
519-
if verbose:
520-
import traceback
521-
print(f"{error_msg}\n{traceback.format_exc()}")
522-
return error_msg
523-
524-
525-
# Single file test
526475
# Single file test
527476
if __name__ == "__main__":
528477
import argparse
@@ -567,4 +516,7 @@ def process_file(file_row, output_summary_dir, config_path=None, verbose=False):
567516
print(f"\nError running model: {e}")
568517
if args.verbose:
569518
print("\nFull traceback:")
570-
print(traceback.format_exc())
519+
print(traceback.format_exc())
520+
521+
522+

0 commit comments

Comments
 (0)