Add labels to disease module parameters#1610
Conversation
marghe-molaro
left a comment
There was a problem hiding this comment.
Hi @mnjowe,
Many thanks for this, it's looking good! I just had a couple of comments:
-
I think we need to both ensure and test that
label,MIN, andMAXfor each parameter are either loaded from the relevant resource file, or correctly default to 'unassigned', 'None' and 'None' when not specified. This is to anticipate a situation where developers progressively start to fill these values in, but this is not done simultaneously for all module/parameters and/or modules; -
It would be amazing if we could have, at the time of the initialisation of the simulation, a log entry of the breakdown of the different parameter types, for each module and then for all combined. Eventually, this would look something like this (for the example case where only Rti and Alri modules are loaded in the sim, and where numbers are obviously made up):
'unassigned', 'free', 'constant', 'context_specific', 'total'
Rti: 5, 15, 3, 7, 30
Alri: 3, 16, 9, 11, 39
TLO total: 8, 31, 12, 18, 69
It will be incredibly exciting to be able to see this breakdown, as it will give us both an initial insight into the model's predictive validity and an idea of the effort required to generalise the model to other countries, both from a calibration and parameter harvesting perspective!
| Parameter(Types.REAL, 'parameter description', label='other') | ||
|
|
||
| class TestLoadParametersFromDataframe: | ||
| """Tests for the load_parameters_from_dataframe method |
There was a problem hiding this comment.
I think we might need to modify the load_parameters_from_dataframe function too, to ensure that if resource files do include the label and a MIN/MAX range this is loaded properly?
There was a problem hiding this comment.
I think with the lines I have numbered below, load_parameters_from_dataframe will always load properly, regardless of some extra columns added to the resource file.
--1-- resource.set_index('parameter_name', inplace=True)
skipped_data_types = ('DATA_FRAME', 'SERIES')
# for each supported parameter, convert to the correct type
for parameter_name in resource.index[resource.index.notnull()]:
parameter_definition = self.PARAMETERS[parameter_name]
if parameter_definition.type_.name in skipped_data_types:
continue
# For each parameter, raise error if the value can't be coerced
--2-- parameter_value = resource.at[parameter_name, 'value']
There was a problem hiding this comment.
It will load the 'value' correctly, but will it assign label, MIN, and MAX correctly too?
There was a problem hiding this comment.
But I think we have to decide if we want to capture module parameter stats with regard to the labels here in which case we need to modify load_parameters? i.e. like you said
unassigned free constant context_specific total
module_name: 5 15 3 7 30
There was a problem hiding this comment.
Again, should we initialise labels, min, and max in both module and resource files? Or should we initialise these during load parameters as we are doing with module parameters?
There was a problem hiding this comment.
Labels, min, and max should be defined in the resource files only.
IF they are not specified in the resource files, they should be assigned default values - of 'unassigned', 'None', and 'None' respectively.
There was a problem hiding this comment.
It will load the 'value' correctly, but will it assign label, MIN, and MAX correctly too?
No. Needs to be modified to accommodate that
|
Thanks @mnjowe and @marghe-molaro Looping in @tamuri here as this is deep inside the framework. |
|
Thanks for pinging me. First thought: these changes should be on the |
|
Absolutely true. Thanks Asif |
Just to be clear on how this is to be logged, we are only interested in logging parameter label statistics per each disease module, right? Not including min and max values |
Yes that's correct, we are only interested in logging parameter label statistics |
|
Hi @marghe-molaro. Following our discussion, I’ve made some updates to how we’ll capture label statistics. A key change is that I’ve reverted the recent modifications to the Parameters class. The main reasons for this are:
|
|
I’d however appreciate some clarification on the preferred way of capturing label statistics. Should these be written to the log file in the outputs folder, or is it sufficient to simply print a statement at the end of the simulation(using logger.info)? Currently, I’m doing the latter. This will inform the checks to be included in |
Anywhere is fine for now. |
… update labels dictionary in simulation
|
Hi @marghe-molaro or @tamuri. Can you advise on the best way to connect this call in core to this function here in simulation. On my machine, it's running just fine, but it's crashing here, something I think I know the reason for. It seems by the time I'm calling the Also, any thoughts on how I can pass the data from core to simulation for computing label stats, but also logging the stats at the end of the simulation? Or if there is another good way of doing this, you can advise, I'm flexible to change. After this is suggested @marghe-molaro , I will finish by adding Min and Max labels to this function , perhaps also renaming it to something like param_labels_data |
Is there a reason this information needs to be stored in You could add any logging for simulation end to |
|
That's a good question @tamuri. I think the idea is to load the information about modules and their parameter labels once then compute and log their statistics once after simulation end in sim.finalise. The thought of doing this completely in Module crossed my mind, and I wanted to use the Apart from simulation_end, I didn't see any other place in the Module where I can do this and can be executed at the end of the simulation. That's why I opted to transfer the labels info to the |
|
Yes, so I think not much needs to change. Instead of storing all the modules' labels together, store each module's labels in a field in the module itself. Then, in |
|
Ah, I see. Thanks Asif, and yes, the |
|
Hi @mnjowe, I think we need to include an additional parameter label category, called something like "scenario", which would indicate parameters that are used by developers to define hypothetical future scenarios (and which therefore don't fall into any of the previously defined categories). For example, a parameter like ART_coverage_from_2025 used to compare hypothetical future trends in ART coverage would classify as a "scenario" parameter. Ideally, these "scenario" parameters should eventually be separated from standard parameters— stored in separate files/kept outside the resources directory + handled in a distinct variable class, but I think we should do this at a later stage. Let's first get all module developers to declare these parameters as "scenario", and then someone can tidy up the structure of the code at a later stage. What do you think? |
…ng label data to simulation. updated test
|
Hi everyone. Now the tests below are failing because of resource files like the attached ones. My assumption is that the lower and upper columns will only take values that are integers, floats, or None. The attached resource file has some values that do not align with my assumption. Should I contact Tara and see how we can resolve them? |
Looks like a good idea. Just wondering though, do we have enough/more of these parameters that will require a separate folder outside the resources folder and a separate class? |
|
Hi @mnjowe, it looks like Tara is already defining some columns that most likely overlap with our definition. |
|
Great, thanks! |
|
@mnjowe noting here what we decided at today's meeting:
|
|
Hi @tamuri and @marghe-molaro, kindly find above my latest commits reflecting the discussions we've had so far. One observation: some disease modules and their parameters are not initializing values via the To flag this, I’ve introduced a Let me know your thoughts. Thanks |
|
Hi @mnjowe, Thank you so much for these latest changes (which I will review asap) and this very interesting observation. I really like your proposed solution of keeping tabs on parameters not initialised through the |
|
some modules are initialising parameter values like the below |
|
and then there are modules like Demography who are combining both ways |
|
Thank you very much for providing these examples @mnjowe. These instances all seem to be linked with 'operational' modules - HealthBurden, Demography, etc - rather than disease modules, so it's something that we can address ourselves later without requiring the input of disease module developers. So for now your logging solution is perfect. |
|
Here is the presentation describing how we would like developers to update their reference files: https://docs.google.com/presentation/d/1VvkJiwEUrSxjS9f49IU6KT5aVYFbNqK1qpvKxOSGmuY/edit?usp=sharing |
|
Sorry for the delay - I've blocked all of tomorrow to work through this (and the other PRs). (Inspired by Tim) |
…ameters # Conflicts: # src/tlo/simulation.py
Co-authored-by: Asif Tamuri <tamuri@gmail.com>
Co-authored-by: Asif Tamuri <tamuri@gmail.com>
Thanks @tbhallett, I have added some tests from here. Or do you mean some other tests? |
Sorry --- I had missed those: all looks good to me. |
This PR is a first step in addressing issue #1036. We aim to have all parameters assigned a label as @marghe-molaro has clearly explained here