EMOD-Hub
diff --git a/‎README.md‎
Lines changed: 30 additions & 17 deletions b/‎README.md‎
Lines changed: 30 additions & 17 deletions
diff --git a/‎docker/Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎docker/Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docker/Singularity.def‎
Lines changed: 1 addition & 1 deletion b/‎docker/Singularity.def‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎fpg_observational_model/run_observational_model.py‎
Lines changed: 10 additions & 58 deletions b/‎fpg_observational_model/run_observational_model.py‎
Lines changed: 10 additions & 58 deletions
@@ -45,7 +45,7 @@ Each config entry will match this format, with the available options:
 ~~~
     {user_provided_name:
         {
-        'method': ['random', 'seasonal', 'age'],
+        'method': ['random', 'seasonal'],
                 'n_samples_year': Int,
                 'replicates': 2,
                 'method_params': {
@@ -55,7 +55,7 @@ Each config entry will match this format, with the available options:
     }
 ~~~
 
-There are three broad method options for sampling:
+There are two broad method options for temporal sampling:
 1) 'random' - will sample N infections per year, tends to match seasonality cases. Can be further directed with the following 'method_params' options:
     - 'population_proportion': list, N populations. Used to sample from the source or sink only, equally, etc. Within population comparisons of genetic metrics can be specified below. Confirm the total number of samples per year * proportion reflects the minimum numbers of infections desired per population.
     - 'monogenomic_proportion': False or float for true (< 1). Will bias the sampling to include fewer or more monogenomic infections than may be the Bool modeled proportion. Used to compare the effect of metrics derived from monogenomic (e.g. unique proportion) or polygenomic samples (e.g. co-transmission proportion, Rh)
@@ -64,10 +64,6 @@ There are three broad method options for sampling:
 2) 'seasonal': Will sample N infections per year, each in the wet or the dry season to compare temporal sampling effects. Currently, the model is set-up for the consistent Sahelian seasonality, must update for other seasonality simulation scenarios. If an intervention start time is provided, this sampling frame is unaffected - the simulation years and months are used to make sure sequential seasonal groupings. Can be further refined with following 'method_params' options:
     - 'season': 'full' for all months in the wet or dry season or 'peak' for the 3 highest and lowest case months. Months for sampling are hardcoded for both full and peak season options.  
 
-3) 'age': Will sample N infections per year, but will direct which age individuals are presented most in the population regardless of age distribution specified in the model. Use for comparing sampling schemes based on age, e.g. mirror biased sampling such as school surveys. Can be further customized with following 'method_params' options:
-    - age_bins: List with the upper bound of each group. Default: [5, 15, 100]
-    - age_bin_labels: List containing names for each age grouping. Default: ['0-5yrs', '5-15yrs', '15+yrs']
-
 
 ### Subpopulation comparisons
 
@@ -76,7 +72,6 @@ Above options will calculate metrics for all samples in a population for each sa
  The subpopulation options supported include:
 - 'add_monthly':  Provide summary statistics by month for all infections. Excludes IBx and Rh relatedness calculations to reduce computational time and memory and real data calculations are not computed at this scale. This default can be changed in the run_time_summaries
  function in unified_metric_calculations by using the complete nested dictionary instead of the nested dictionary ignoring the monthly groupings on infections. (May require further testing and debugging.)
-- 'populations':  Defined by the population node in EMOD
 - 'polygenomic':  Is polygenomic = 1, else monogenomic = 0
 - 'symptomatic':  Is symptomatic = 1, else asymptomatic = 0
 - 'age_bins':  Default age bins: 0-5, 5-15, 15+
@@ -121,9 +116,10 @@ This section defines with genetic metrics will be calculated for each set as boo
 ``{sim_id}_FPG_ModelSummaries.csv``: File containing the genetic metrics across columns and the years, season, and subpopulation comparisons as columns. Addition of summary statistic columns can vary based on user options for metric calculations. 
 
 - 'sampling_scheme': Grouping variable for the sampling scheme applied (matches 'sampling' options in config).
-- 'comparison_type': Identifies with sampling scheme groupings, such as yearly or seasonal groups, or specified subpopulations (matches 'subpopulation_comparison' options in config).
-- 'year_group': Specifies the year (either simulation year or intervention shifted year) or the seasonal grouping bin for summary statistics.
-- 'sub_group': Sepcifies the additional groupings within subpopulations, e.g. whether True/False for polygenomic or symptomatic.  
+- 'time_group': The time window used for grouping, either 'group_month', 'group_year', or 'group_season'. 
+- 'time_value': Specifies the year (either simulation year or intervention shifted year), seasonal grouping bin, or month for summary statistics.
+- 'comparison_type': Identifies with sampling scheme groupings, such as 'all' infections in a time period, by subpopulations such as 'polygenomic' or 'symptomatic'.
+- 'comparison_group': The specific group identified for 'comparison_type', e.g. whether True/False for polygenomic or symptomatic. 
 - 'n_infections': Counts for the number of infections in each sampling scheme, for each year and subpopulation grouping specified in the observational model run. These are the actual number of infections that were available in the report by grouping and may be lower than the specified targets.
 - '{true/effective/genotype}_poly_coi_count': The number of infections per grouping that have a COI > 2. True is the modeled number of genomes tracked, which effective is the number of unique and detectable genomes in an infection by ancestry, and genome is the number of unique detectable genomes in an infection by bi-allelic representation.
 - '{true/effective/genotype_poly_coi_prop}': The proportion of infections per grouping that have a COI > 2. Calculated as '{true/effective/genotype}_poly_coi_count'/n_infections.
@@ -144,21 +140,22 @@ This section defines with genetic metrics will be calculated for each set as boo
 
 
 
-``{sim_id}-{ibd/ibs}_distributions.json``: To avoid large pairwise matrices s output, to further investigate population level IBs distributions one could use the JSON file with the IBx calculated value as the key up to two decimal places and the number of pairwise counts as a the value. It matches the output CSV in matching sampling, comparison_types, and subpopulations. 
+``{sim_id}-{ibd/ibs}_distributions.json``: To avoid large pairwise matrices s output, to further investigate population level IBs distributions one could use the JSON file with the IBx calculated value as the key up to two decimal places and the number of pairwise counts as a the value. It matches the output CSV in matching sampling, comparison_types and groups by population.
 
 ~~~
-  {
+  {"population_N": {
       "user_specified_name": { # "sampling_scheme 
-          "population": { # comparision_type Like group_year, season_bin, population polygenomic, etc.
+          "symptomatic": { # comparision_type Like group_year, season_bin, population polygenomic, etc.
               "(2, 0)": {       # For subpopulations, the key here can be tuple, with the first item is the group_year, and the second is the group identifier. For example, this is for year 2, population 0
                   "0.5": 54,
                   "0.7": 41,
                   "0.9": 7,
                   "1": 4
-              }
+                }
 
-          }
-      }
+            }
+        }
+     }
   }
 ~~~    
 
@@ -170,12 +167,28 @@ In the the absence of the mapping file, one can look for the directories belongi
 
 ~~~
 # Example pull of data
-EXPERIMENT_NAME="/mnt/calculon2/jsuresh/output/maka fpg 10k - 6yr - strong ITNs in_20250522_195001/"
+EXPERIMENT_NAME="/mnt/calculon2/{user}/output/{emod_experiment_id}"
 OUTPUT_FILE="experiment_mapping.csv"
 
 { echo "output_name,input_dir"; find "$EXPERIMENT_NAME" -name "output" -type d | sed 's|.*/\([0-9a-f]\{8\}-[0-9a-f]\{4\}-[0-9a-f]\{4\}-[0-9a-f]\{4\}-[0-9a-f]\{12\}\)/output$|\1,"\0"|'; } > "$OUTPUT_FILE"
 ~~~
 
+## IDM Developer Notes
+
+If updating the repository to run the observational model on COMPs, these are the steps to set up the Singularity image.
+
+1) Update the following files with a new `1.0.0..dev{n+1}` version number:
+    - docker/Dockerfile: Line 52
+    - docker/Singularity: Line 61
+    - pyproject.toml: Line 7
+
+2) After committing/merging to EMOD-Hub branch without errors, click on Actions -> Promote package to production and match the new version name when prompted.
+
+3) Actions -> Build and push similarity image. Keep all the same information form COMPs or specify file locations as needed. 
+
+4) Pass the docker/ObsModel_rocky.id to emodpy-malaria files to run with new simulations. 
+
+
 # Disclaimer
 The code in this repository was developed by IDM and other collaborators to support our joint research on flexible agent-based modeling.
  We've made it publicly available under the MIT License to provide others with a better understanding of our research and an opportunity to build upon it for 
 
@@ -49,7 +49,7 @@ RUN pip3 install --no-cache-dir "python-snappy==0.6.1"
 
 # Install the ObsModel packages
 # RUN bash -c "pip3 install -r /tmp/requirements.txt"
-RUN pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev5" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
+RUN pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev6" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
 
 # Install the emod_api package
 RUN pip3 install --no-cache-dir "emod-api>=1.33.7,<2" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
 
@@ -58,7 +58,7 @@ pip3 install --no-cache-dir "python-snappy==0.6.1"
 
 # Install the ObsModel packages
 # RUN bash -c "pip3 install -r /tmp/requirements.txt"
-pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev5" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
+pip3 install --no-cache-dir "fpg-observational-model==1.0.0.dev6" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
 
 # Install the emod_api package
 pip3 install --no-cache-dir "emod-api>=1.33.7,<2" --extra-index-url https://packages.idmod.org/api/pypi/pypi-production/simple
 
@@ -44,13 +44,7 @@ def get_default_config():
             #     'method_params': {
             #         'season': 'full', # Options: full or peak; currently hardcoded to match Senegal's seasonality; update for other scenarios in unified_sampling.py
             #     }
-            # },
-            # 'age': { # Example of how to set-up a sampling scheme based on age, to mirror biased sampling such as school surveys and health facility comparisons. 
-            #     'method': 'age',
-            #     'n_samples_year': 15,
-            #     'replicates': 1
-            # }
-            
+            # } 
         },
         'metrics': {
             'cotransmission_proportion': True,
@@ -65,8 +59,7 @@ def get_default_config():
             'unique_genome_proportion': True # Will calculate both the proportion of unique genomes in the sampled infections to replicate phasing and from monogenomic samples with an effective COI of 1  only to match barcode limits.
         },
         'subpopulation_comparisons': { # Supported for yearly and seasonal temporal sampling schemes, not age-based sampling. 
-            'add_monthly': False,  # Whether to add monthly comparisons within each year
-            'populations': False,  # Defined by the population node in EMOD
+            'add_monthly': False,  # Whether to add monthly comparisons in addition to yearly comparisons for temporal sampling schemes
             'polygenomic': True,  # Is polygenomic = 1, else monogenomic = 0
             'symptomatic': False,  # Is symptomatic = 1, else asymptomatic = 0
             'age_bins': False     # Default age bins: 0-5, 5-15, 15+
@@ -385,7 +378,8 @@ def deep_merge(default_dict, override_dict):
             print(f"Error: {infection_df_path} not found. Loading test data.")
         infection_df = pd.read_csv('test_data/test_fpg_infections.csv')
 
-        # Run sampling model
+    # Run sampling model
+    print(f"Config paramters for sampling model:\n {config}")    
     sample_df = run_sampling_model(
         input_df=infection_df,
         config=config,
@@ -405,7 +399,7 @@ def deep_merge(default_dict, override_dict):
     ibs_matrix = None
     # Optional - included if need to filter out non-variant tracked sites, i.e. immunity markers or drugR for calculating genetic metrics only on neutral variant sites.
     variant_indices = None
-    #variant_indices = [0, 3, 5, 6, 8, 9, 11, 12, 14, 16, 17, 19, 20, 22, 24, 25, 27, 28, 30, 32, 33, 35, 37, 38, 40, 42, 43, 45, 47, 48, 50, 52, 53, 55, 57, 58, 60, 62, 63, 65, 67, 68, 69, 71, 73, 75, 77, 78, 79, 80, 82, 84, 86, 88, 89, 90, 91, 93, 94, 96, 98, 99, 101, 102, 103, 104, 106, 107, 109, 110, 112, 113, 115, 116, 117, 118, 119, 121, 122, 124, 125, 127, 128, 130, 131, 132, 133, 134, 135, 137, 138, 139, 141, 142, 144, 145, 146, 148, 149, 150]
+    # variant_indices = []
 
     if config['metrics']['identity_by_descent']:
         user_specified_ibx.append('ibd')
@@ -476,53 +470,8 @@ def deep_merge(default_dict, override_dict):
 
 
 #####################################################################################
-# Parallelizable wrapper function
+# Single run test
 #####################################################################################
-def process_file(file_row, output_summary_dir, config_path=None, verbose=False):
-    """
-    Process a single file for parallel execution.
-    
-    Parameters:
-        file_row: pandas Series or dict with 'output_name' and 'input_dir' columns
-        output_summary_dir: Directory to save outputs
-        config_path: Path to config file (optional)
-        verbose: Whether to print verbose output
-        
-    Returns:
-        str: Name of processed simulation
-    """
-    try:
-        # Extract information from the row
-        sim_name = file_row['output_name']
-        emod_output_path = file_row['input_dir']
-        
-        # Use default config if not specified
-        if config_path is None or not os.path.exists(config_path):
-            config_path = ""  # Will trigger default config usage
-            
-        # Create output directory for this simulation
-        output_path = os.path.join(output_summary_dir, sim_name)
-        
-        # Run the observational model
-        result = run_observational_model(
-            sim_name=sim_name,
-            emod_output_path=emod_output_path,
-            config_path=config_path,
-            output_path=output_path,
-            verbose=verbose
-        )
-        
-        return f"SUCCESS: {sim_name}"
-        
-    except Exception as e:
-        error_msg = f"ERROR processing {file_row.get('output_name', 'unknown')}: {str(e)}"
-        if verbose:
-            import traceback
-            print(f"{error_msg}\n{traceback.format_exc()}")
-        return error_msg
-
-
-# Single file test
 # Single file test
 if __name__ == "__main__":
     import argparse
@@ -567,4 +516,7 @@ def process_file(file_row, output_summary_dir, config_path=None, verbose=False):
         print(f"\nError running model: {e}")
         if args.verbose:
             print("\nFull traceback:")
-            print(traceback.format_exc())
+            print(traceback.format_exc())
+
+
+