You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-17Lines changed: 30 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ Each config entry will match this format, with the available options:
45
45
~~~
46
46
{user_provided_name:
47
47
{
48
-
'method': ['random', 'seasonal', 'age'],
48
+
'method': ['random', 'seasonal'],
49
49
'n_samples_year': Int,
50
50
'replicates': 2,
51
51
'method_params': {
@@ -55,7 +55,7 @@ Each config entry will match this format, with the available options:
55
55
}
56
56
~~~
57
57
58
-
There are three broad method options for sampling:
58
+
There are two broad method options for temporal sampling:
59
59
1) 'random' - will sample N infections per year, tends to match seasonality cases. Can be further directed with the following 'method_params' options:
60
60
- 'population_proportion': list, N populations. Used to sample from the source or sink only, equally, etc. Within population comparisons of genetic metrics can be specified below. Confirm the total number of samples per year * proportion reflects the minimum numbers of infections desired per population.
61
61
- 'monogenomic_proportion': False or float for true (< 1). Will bias the sampling to include fewer or more monogenomic infections than may be the Bool modeled proportion. Used to compare the effect of metrics derived from monogenomic (e.g. unique proportion) or polygenomic samples (e.g. co-transmission proportion, Rh)
@@ -64,10 +64,6 @@ There are three broad method options for sampling:
64
64
2) 'seasonal': Will sample N infections per year, each in the wet or the dry season to compare temporal sampling effects. Currently, the model is set-up for the consistent Sahelian seasonality, must update for other seasonality simulation scenarios. If an intervention start time is provided, this sampling frame is unaffected - the simulation years and months are used to make sure sequential seasonal groupings. Can be further refined with following 'method_params' options:
65
65
- 'season': 'full' for all months in the wet or dry season or 'peak' for the 3 highest and lowest case months. Months for sampling are hardcoded for both full and peak season options.
66
66
67
-
3) 'age': Will sample N infections per year, but will direct which age individuals are presented most in the population regardless of age distribution specified in the model. Use for comparing sampling schemes based on age, e.g. mirror biased sampling such as school surveys. Can be further customized with following 'method_params' options:
68
-
- age_bins: List with the upper bound of each group. Default: [5, 15, 100]
69
-
- age_bin_labels: List containing names for each age grouping. Default: ['0-5yrs', '5-15yrs', '15+yrs']
70
-
71
67
72
68
### Subpopulation comparisons
73
69
@@ -76,7 +72,6 @@ Above options will calculate metrics for all samples in a population for each sa
76
72
The subpopulation options supported include:
77
73
- 'add_monthly': Provide summary statistics by month for all infections. Excludes IBx and Rh relatedness calculations to reduce computational time and memory and real data calculations are not computed at this scale. This default can be changed in the run_time_summaries
78
74
function in unified_metric_calculations by using the complete nested dictionary instead of the nested dictionary ignoring the monthly groupings on infections. (May require further testing and debugging.)
79
-
- 'populations': Defined by the population node in EMOD
80
75
- 'polygenomic': Is polygenomic = 1, else monogenomic = 0
81
76
- 'symptomatic': Is symptomatic = 1, else asymptomatic = 0
82
77
- 'age_bins': Default age bins: 0-5, 5-15, 15+
@@ -121,9 +116,10 @@ This section defines with genetic metrics will be calculated for each set as boo
121
116
``{sim_id}_FPG_ModelSummaries.csv``: File containing the genetic metrics across columns and the years, season, and subpopulation comparisons as columns. Addition of summary statistic columns can vary based on user options for metric calculations.
122
117
123
118
- 'sampling_scheme': Grouping variable for the sampling scheme applied (matches 'sampling' options in config).
124
-
- 'comparison_type': Identifies with sampling scheme groupings, such as yearly or seasonal groups, or specified subpopulations (matches 'subpopulation_comparison' options in config).
125
-
- 'year_group': Specifies the year (either simulation year or intervention shifted year) or the seasonal grouping bin for summary statistics.
126
-
- 'sub_group': Sepcifies the additional groupings within subpopulations, e.g. whether True/False for polygenomic or symptomatic.
119
+
- 'time_group': The time window used for grouping, either 'group_month', 'group_year', or 'group_season'.
120
+
- 'time_value': Specifies the year (either simulation year or intervention shifted year), seasonal grouping bin, or month for summary statistics.
121
+
- 'comparison_type': Identifies with sampling scheme groupings, such as 'all' infections in a time period, by subpopulations such as 'polygenomic' or 'symptomatic'.
122
+
- 'comparison_group': The specific group identified for 'comparison_type', e.g. whether True/False for polygenomic or symptomatic.
127
123
- 'n_infections': Counts for the number of infections in each sampling scheme, for each year and subpopulation grouping specified in the observational model run. These are the actual number of infections that were available in the report by grouping and may be lower than the specified targets.
128
124
- '{true/effective/genotype}_poly_coi_count': The number of infections per grouping that have a COI > 2. True is the modeled number of genomes tracked, which effective is the number of unique and detectable genomes in an infection by ancestry, and genome is the number of unique detectable genomes in an infection by bi-allelic representation.
129
125
- '{true/effective/genotype_poly_coi_prop}': The proportion of infections per grouping that have a COI > 2. Calculated as '{true/effective/genotype}_poly_coi_count'/n_infections.
@@ -144,21 +140,22 @@ This section defines with genetic metrics will be calculated for each set as boo
144
140
145
141
146
142
147
-
``{sim_id}-{ibd/ibs}_distributions.json``: To avoid large pairwise matrices s output, to further investigate population level IBs distributions one could use the JSON file with the IBx calculated value as the key up to two decimal places and the number of pairwise counts as a the value. It matches the output CSV in matching sampling, comparison_types, and subpopulations.
143
+
``{sim_id}-{ibd/ibs}_distributions.json``: To avoid large pairwise matrices s output, to further investigate population level IBs distributions one could use the JSON file with the IBx calculated value as the key up to two decimal places and the number of pairwise counts as a the value. It matches the output CSV in matching sampling, comparison_types and groups by population.
148
144
149
145
~~~
150
-
{
146
+
{"population_N": {
151
147
"user_specified_name": { # "sampling_scheme
152
-
"population": { # comparision_type Like group_year, season_bin, population polygenomic, etc.
148
+
"symptomatic": { # comparision_type Like group_year, season_bin, population polygenomic, etc.
153
149
"(2, 0)": { # For subpopulations, the key here can be tuple, with the first item is the group_year, and the second is the group identifier. For example, this is for year 2, population 0
154
150
"0.5": 54,
155
151
"0.7": 41,
156
152
"0.9": 7,
157
153
"1": 4
158
-
}
154
+
}
159
155
160
-
}
161
-
}
156
+
}
157
+
}
158
+
}
162
159
}
163
160
~~~
164
161
@@ -170,12 +167,28 @@ In the the absence of the mapping file, one can look for the directories belongi
{ echo "output_name,input_dir"; find "$EXPERIMENT_NAME" -name "output" -type d | sed 's|.*/\([0-9a-f]\{8\}-[0-9a-f]\{4\}-[0-9a-f]\{4\}-[0-9a-f]\{4\}-[0-9a-f]\{12\}\)/output$|\1,"\0"|'; } > "$OUTPUT_FILE"
177
174
~~~
178
175
176
+
## IDM Developer Notes
177
+
178
+
If updating the repository to run the observational model on COMPs, these are the steps to set up the Singularity image.
179
+
180
+
1) Update the following files with a new `1.0.0..dev{n+1}` version number:
181
+
- docker/Dockerfile: Line 52
182
+
- docker/Singularity: Line 61
183
+
- pyproject.toml: Line 7
184
+
185
+
2) After committing/merging to EMOD-Hub branch without errors, click on Actions -> Promote package to production and match the new version name when prompted.
186
+
187
+
3) Actions -> Build and push similarity image. Keep all the same information form COMPs or specify file locations as needed.
188
+
189
+
4) Pass the docker/ObsModel_rocky.id to emodpy-malaria files to run with new simulations.
190
+
191
+
179
192
# Disclaimer
180
193
The code in this repository was developed by IDM and other collaborators to support our joint research on flexible agent-based modeling.
181
194
We've made it publicly available under the MIT License to provide others with a better understanding of our research and an opportunity to build upon it for
Copy file name to clipboardExpand all lines: fpg_observational_model/run_observational_model.py
+10-58Lines changed: 10 additions & 58 deletions
Original file line number
Diff line number
Diff line change
@@ -44,13 +44,7 @@ def get_default_config():
44
44
# 'method_params': {
45
45
# 'season': 'full', # Options: full or peak; currently hardcoded to match Senegal's seasonality; update for other scenarios in unified_sampling.py
46
46
# }
47
-
# },
48
-
# 'age': { # Example of how to set-up a sampling scheme based on age, to mirror biased sampling such as school surveys and health facility comparisons.
49
-
# 'method': 'age',
50
-
# 'n_samples_year': 15,
51
-
# 'replicates': 1
52
-
# }
53
-
47
+
# }
54
48
},
55
49
'metrics': {
56
50
'cotransmission_proportion': True,
@@ -65,8 +59,7 @@ def get_default_config():
65
59
'unique_genome_proportion': True# Will calculate both the proportion of unique genomes in the sampled infections to replicate phasing and from monogenomic samples with an effective COI of 1 only to match barcode limits.
66
60
},
67
61
'subpopulation_comparisons': { # Supported for yearly and seasonal temporal sampling schemes, not age-based sampling.
68
-
'add_monthly': False, # Whether to add monthly comparisons within each year
69
-
'populations': False, # Defined by the population node in EMOD
62
+
'add_monthly': False, # Whether to add monthly comparisons in addition to yearly comparisons for temporal sampling schemes
# Optional - included if need to filter out non-variant tracked sites, i.e. immunity markers or drugR for calculating genetic metrics only on neutral variant sites.
0 commit comments