Skip to content

Add labels to disease module parameters#1610

Merged
tamuri merged 17 commits intomasterfrom
mnjowe/assign_labels_to_disease_module_parameters
Jun 5, 2025
Merged

Add labels to disease module parameters#1610
tamuri merged 17 commits intomasterfrom
mnjowe/assign_labels_to_disease_module_parameters

Conversation

@mnjowe
Copy link
Copy Markdown
Collaborator

@mnjowe mnjowe commented Apr 23, 2025

This PR is a first step in addressing issue #1036. We aim to have all parameters assigned a label as @marghe-molaro has clearly explained here

@mnjowe mnjowe self-assigned this Apr 23, 2025
Copy link
Copy Markdown
Collaborator

@marghe-molaro marghe-molaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mnjowe,

Many thanks for this, it's looking good! I just had a couple of comments:

  • I think we need to both ensure and test that label, MIN, and MAX for each parameter are either loaded from the relevant resource file, or correctly default to 'unassigned', 'None' and 'None' when not specified. This is to anticipate a situation where developers progressively start to fill these values in, but this is not done simultaneously for all module/parameters and/or modules;

  • It would be amazing if we could have, at the time of the initialisation of the simulation, a log entry of the breakdown of the different parameter types, for each module and then for all combined. Eventually, this would look something like this (for the example case where only Rti and Alri modules are loaded in the sim, and where numbers are obviously made up):

    'unassigned', 'free', 'constant', 'context_specific', 'total'
    Rti: 5, 15, 3, 7, 30
    Alri: 3, 16, 9, 11, 39
    TLO total: 8, 31, 12, 18, 69

It will be incredibly exciting to be able to see this breakdown, as it will give us both an initial insight into the model's predictive validity and an idea of the effort required to generalise the model to other countries, both from a calibration and parameter harvesting perspective!

Parameter(Types.REAL, 'parameter description', label='other')

class TestLoadParametersFromDataframe:
"""Tests for the load_parameters_from_dataframe method
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to modify the load_parameters_from_dataframe function too, to ensure that if resource files do include the label and a MIN/MAX range this is loaded properly?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the lines I have numbered below, load_parameters_from_dataframe will always load properly, regardless of some extra columns added to the resource file.

   --1--      resource.set_index('parameter_name', inplace=True)
                      skipped_data_types = ('DATA_FRAME', 'SERIES')
                      # for each supported parameter, convert to the correct type
                      for parameter_name in resource.index[resource.index.notnull()]:
                          parameter_definition = self.PARAMETERS[parameter_name]
              
                          if parameter_definition.type_.name in skipped_data_types:
                              continue
              
                          # For each parameter, raise error if the value can't be coerced
     --2--              parameter_value = resource.at[parameter_name, 'value']
 

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will load the 'value' correctly, but will it assign label, MIN, and MAX correctly too?

Copy link
Copy Markdown
Collaborator Author

@mnjowe mnjowe Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we have to decide if we want to capture module parameter stats with regard to the labels here in which case we need to modify load_parameters? i.e. like you said

                        unassigned  free     constant   context_specific      total
module_name:   5                  15            3                  7                         30

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should we initialise labels, min, and max in both module and resource files? Or should we initialise these during load parameters as we are doing with module parameters?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Labels, min, and max should be defined in the resource files only.
IF they are not specified in the resource files, they should be assigned default values - of 'unassigned', 'None', and 'None' respectively.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will load the 'value' correctly, but will it assign label, MIN, and MAX correctly too?

No. Needs to be modified to accommodate that

@tbhallett tbhallett requested a review from tamuri April 23, 2025 10:05
@tbhallett
Copy link
Copy Markdown
Collaborator

Thanks @mnjowe and @marghe-molaro

Looping in @tamuri here as this is deep inside the framework.

@tamuri
Copy link
Copy Markdown
Collaborator

tamuri commented Apr 23, 2025

Thanks for pinging me. First thought: these changes should be on the Parameter class, not the base Specifiable class. Because these changes do not apply to a Property, which also inherits from Specifiable.

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented Apr 23, 2025

Absolutely true. Thanks Asif

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented Apr 23, 2025

  • 'unassigned', 'free', 'constant', 'context_specific', 'total'
    Rti: 5, 15, 3, 7, 30
    Alri: 3, 16, 9, 11, 39
    TLO total: 8, 31, 12, 18, 69

Just to be clear on how this is to be logged, we are only interested in logging parameter label statistics per each disease module, right? Not including min and max values

@marghe-molaro
Copy link
Copy Markdown
Collaborator

  • 'unassigned', 'free', 'constant', 'context_specific', 'total'
    Rti: 5, 15, 3, 7, 30
    Alri: 3, 16, 9, 11, 39
    TLO total: 8, 31, 12, 18, 69

Just to be clear on how this is to be logged, we are only interested in logging parameter label statistics per each disease module, right? Not including min and max values

Yes that's correct, we are only interested in logging parameter label statistics

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 5, 2025

Hi @marghe-molaro. Following our discussion, I’ve made some updates to how we’ll capture label statistics. A key change is that I’ve reverted the recent modifications to the Parameters class. The main reasons for this are:

  1. The fact that modelers won't define labels or their MIN/MAX values in the parameters section of the module; instead, these will be specified in the resource file.
  2. These label definitions will not be used during simulation runtime within disease modules. They will only be referenced at the end of the simulation when logging statistics.

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 5, 2025

I’d however appreciate some clarification on the preferred way of capturing label statistics. Should these be written to the log file in the outputs folder, or is it sufficient to simply print a statement at the end of the simulation(using logger.info)? Currently, I’m doing the latter. This will inform the checks to be included in test_parameter_label in test_core.py

@marghe-molaro
Copy link
Copy Markdown
Collaborator

I’d however appreciate some clarification on the preferred way of capturing label statistics. Should these be written to the log file in the outputs folder, or is it sufficient to simply print a statement at the end of the simulation(using logger.info)? Currently, I’m doing the latter. This will inform the checks to be included in test_parameter_label in test_core.py

Anywhere is fine for now.

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 7, 2025

Hi @marghe-molaro or @tamuri. Can you advise on the best way to connect this call in core to this function here in simulation. On my machine, it's running just fine, but it's crashing here, something I think I know the reason for. It seems by the time I'm calling the load_parameters function, self.sim hasn't been initialised yet, which results in the following error here

Also, any thoughts on how I can pass the data from core to simulation for computing label stats, but also logging the stats at the end of the simulation? Or if there is another good way of doing this, you can advise, I'm flexible to change.

After this is suggested @marghe-molaro , I will finish by adding Min and Max labels to this function , perhaps also renaming it to something like param_labels_data

@tamuri
Copy link
Copy Markdown
Collaborator

tamuri commented May 7, 2025

Can you advise on the best way to connect this call in core to this function here in simulation.

Is there a reason this information needs to be stored in Simulation rather than being kept inside the module that's loading the resource file? To log, you would iterate over each of the modules and get the module's labels as necessary.

You could add any logging for simulation end to simulation.finalise(). Again, I would call function inside of Module rather than bringing in the logic into Simulation.

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 8, 2025

That's a good question @tamuri. I think the idea is to load the information about modules and their parameter labels once then compute and log their statistics once after simulation end in sim.finalise.

The thought of doing this completely in Module crossed my mind, and I wanted to use the on_simulation_end function, but then that function is sometimes also called by disease modules, and I wouldn't want the compute_label_stats function to be executed in those disease modules.

Apart from simulation_end, I didn't see any other place in the Module where I can do this and can be executed at the end of the simulation. That's why I opted to transfer the labels info to the finalise method in simulation

@tamuri
Copy link
Copy Markdown
Collaborator

tamuri commented May 8, 2025

Yes, so I think not much needs to change. Instead of storing all the modules' labels together, store each module's labels in a field in the module itself. Then, in Simulation.finalise put the minimum amount of logic to collect the required information together. It looks like the key for the param_labels_data dict is simply the module name? So this is the same as looping over each of the modules and collecting the information. You could do that in the same loop that iterates over the modules and calls on_simulation_end on each of them. I mean not to put the logic inside on_simulation_end but rather something like:

def finalise(self, wall_clock_time: Optional[float] = None) -> None:
        for module in self.modules.values():
            module.on_simulation_end()
            labels = module.get_labels()
            # collect labels
        # log labels

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 8, 2025

Ah, I see. Thanks Asif, and yes, the param_labels_data dict key is simply the module name

@marghe-molaro
Copy link
Copy Markdown
Collaborator

Hi @mnjowe, I think we need to include an additional parameter label category, called something like "scenario", which would indicate parameters that are used by developers to define hypothetical future scenarios (and which therefore don't fall into any of the previously defined categories). For example, a parameter like ART_coverage_from_2025 used to compare hypothetical future trends in ART coverage would classify as a "scenario" parameter.

Ideally, these "scenario" parameters should eventually be separated from standard parameters— stored in separate files/kept outside the resources directory + handled in a distinct variable class, but I think we should do this at a later stage. Let's first get all module developers to declare these parameters as "scenario", and then someone can tidy up the structure of the code at a later stage. What do you think?

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 9, 2025

Hi everyone. Now the tests below are failing because of resource files like the attached ones. My assumption is that the lower and upper columns will only take values that are integers, floats, or None. The attached resource file has some values that do not align with my assumption. Should I contact Tara and see how we can resolve them?
parameters.csv

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 9, 2025

Hi @mnjowe, I think we need to include an additional parameter label category, called something like "scenario", which would indicate parameters that are used by developers to define hypothetical future scenarios (and which therefore don't fall into any of the previously defined categories). For example, a parameter like ART_coverage_from_2025 used to compare hypothetical future trends in ART coverage would classify as a "scenario" parameter.

Ideally, these "scenario" parameters should eventually be separated from standard parameters— stored in separate files/kept outside the resources directory + handled in a distinct variable class, but I think we should do this at a later stage. Let's first get all module developers to declare these parameters as "scenario", and then someone can tidy up the structure of the code at a later stage. What do you think?

Looks like a good idea. Just wondering though, do we have enough/more of these parameters that will require a separate folder outside the resources folder and a separate class?

@marghe-molaro
Copy link
Copy Markdown
Collaborator

Hi @mnjowe, it looks like Tara is already defining some columns that most likely overlap with our definition.
My advise would be, for now, to ensure that our column names can't be confused with existing ones - so renaming them something like prior_MIN and prior_MAX - and then let the module developers themselves decide how to merge their existing columns with our new standardised ones. This should circumvent the error.

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 9, 2025

Great, thanks!

@marghe-molaro
Copy link
Copy Markdown
Collaborator

@mnjowe noting here what we decided at today's meeting:

  1. Replace labels used so far as follows:
  • free --> undetermined
  • constant --> universal
  • context_specific --> local
  • scenario --> scenario
  1. Small check that parameter type is respected in [MIN,MAX] columns, to ensure that if parameter is list, there will be a list of MIN values and MAX values respectively.

  2. [In the future] Add a test performing a cold read of the parameter labels, to prevent any merges with master with parameters are unassigned

  3. Current timeline for roll-out: merge this PR into master by the 27th of May so that we can share these changes with developers at that day's Stand Up meeting!

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 26, 2025

Hi @tamuri and @marghe-molaro, kindly find above my latest commits reflecting the discussions we've had so far.

One observation: some disease modules and their parameters are not initializing values via the load_parameters_from_dataframe function. As a result, metadata for those parameters is not being captured.

To flag this, I’ve introduced a not_init_via_load_param label in the logging output. For example, an entry like {'Demography': {'not_init_via_load_param': 7, 'unassigned': 1}} indicates that 7 parameters in the Demography module were initialized outside of load_parameters_from_dataframe, only one parameter was initialised via load_parameters_from_dataframe and was assigned a label of unassigned.

Let me know your thoughts. Thanks

@marghe-molaro
Copy link
Copy Markdown
Collaborator

Hi @mnjowe,

Thank you so much for these latest changes (which I will review asap) and this very interesting observation. I really like your proposed solution of keeping tabs on parameters not initialised through theload_parameters_from_dataframe function. Can I ask, if not initialised through that function, how are they initialised?

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 26, 2025

some modules are initialising parameter values like the below

  1. Simplified births.

  2. HealthBurden

@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented May 26, 2025

and then there are modules like Demography who are combining both ways

@marghe-molaro
Copy link
Copy Markdown
Collaborator

Thank you very much for providing these examples @mnjowe. These instances all seem to be linked with 'operational' modules - HealthBurden, Demography, etc - rather than disease modules, so it's something that we can address ourselves later without requiring the input of disease module developers. So for now your logging solution is perfect.

@marghe-molaro
Copy link
Copy Markdown
Collaborator

Here is the presentation describing how we would like developers to update their reference files: https://docs.google.com/presentation/d/1VvkJiwEUrSxjS9f49IU6KT5aVYFbNqK1qpvKxOSGmuY/edit?usp=sharing

@marghe-molaro
Copy link
Copy Markdown
Collaborator

Hi @tamuri, just a quick reminder that your review on this PR would be highly appreciated, as we're hoping to get it into master before merging PR #1287 and PR #1609 which are almost ready

@marghe-molaro marghe-molaro mentioned this pull request Jun 3, 2025
@tamuri
Copy link
Copy Markdown
Collaborator

tamuri commented Jun 3, 2025

Sorry for the delay - I've blocked all of tomorrow to work through this (and the other PRs). (Inspired by Tim)

@tbhallett tbhallett mentioned this pull request Jun 3, 2025
Copy link
Copy Markdown
Collaborator

@tbhallett tbhallett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the overall design here and will try it out on CC module (#1287) immediately. But, defer to @tamuri on the implementation.
Also, do we not need a test of it?

tamuri and others added 3 commits June 4, 2025 23:09
@mnjowe
Copy link
Copy Markdown
Collaborator Author

mnjowe commented Jun 5, 2025

I'm happy with the overall design here and will try it out on CC module (#1287) immediately. But, defer to @tamuri on the implementation. Also, do we not need a test of it?

Thanks @tbhallett, I have added some tests from here. Or do you mean some other tests?

@tbhallett
Copy link
Copy Markdown
Collaborator

I'm happy with the overall design here and will try it out on CC module (#1287) immediately. But, defer to @tamuri on the implementation. Also, do we not need a test of it?

Thanks @tbhallett, I have added some tests from here. Or do you mean some other tests?

Sorry --- I had missed those: all looks good to me.

@tamuri tamuri merged commit 79faf00 into master Jun 5, 2025
62 checks passed
@tamuri tamuri deleted the mnjowe/assign_labels_to_disease_module_parameters branch June 5, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants