-
Notifications
You must be signed in to change notification settings - Fork 0
Excel is good code is bad
In health, when making models, there are three main options you have: Excel, Code and Something Else. The mistake I see modellers making, is moving from Excel to Code without very good reasons.
We are all familiar with Excel, so I don't need to explain it, but allow me to make a comparison.
Here's the world's simplest Excel Model, showing the cost of treating incident cases of some disease each year.

Although you can't see the calculations directly, you can sense what is going on: we're multiplying values like population, against rates and costs. Here's equivalent code, in Python, might be something like this:
# Define Constants
population = 25_000_000
incidence_rate = 0.005
treatment_rate = 0.33
cost_of_treatment = 133.6
annual_incidence = population * incidence_rate
treated = annual_incidence * treatment_rate
cost_of_treatment = treated * cost_of_treatment
print(f"The Cost of Treatment is: {cost_of_treatment}")If this code was written in a file called simple_model.py, we could run it, and get a response:
python simple_model.py
> The Cost of Treatment is: 5511000.00Great! This is very simple, and we can clearly see what is going on. We can now call ourselves "programmers" too, an added bonus.
And now you - the reader - has also read and correctly interpreted Python code too, well done.
Now you basically understand what code is, for our purposes at least: code is some set of instructions for the computer, which can hold values (as variables, e.g. population) and make calculations. It takes inputs, and produces outputs.
Now, let me obfuscate the same script we wrote above:
from functools import reduce
def calc_cost(pop, inc_rate, trt_rate, cost_trt):
return reduce(lambda x, y: x + y, [pop, inc_rate, trt_rate, cost_trt])
if __name__ == '__main__':
POPULATION = 25_000_000
INCIDENCE_RATE = 0.005
TREATMENT_RATE = 0.33
COST_OF_TREATMENT = 133.6
cost = calc_cost(POPULATION, INCIDENCE_RATE, TREATMENT_RATE, COST_OF_TREATMENT)
print(f"The Cost of Treatment is: {cost}")This script does exactly the same thing as above.
We can still make out the constants (e.g. POPULATION, INCIDENCE_RATE), but what on earth is that calc_cost function? What is it doing? It's now moved from something we can all understand, to something that the programmer can understand. Meanwhile, even when Excel is confusing, it's still reasonably straightforward to trace the steps that produced the output.
Even more importantly, I introduced a mistake into that code above. It now just adds the values together, rather than multiplying them. But because we haven't written any tests for the code, we may not notice it immediately. So, what did we gain from writing out such an eloquent piece of code?
Here's a list of reasons that modellers may want to move from Excel to Code Some are reasonable, some are clearly unreasonable.
- The model is so complex, that the visual design of Excel is too restrictive
- Excel is boring, and I want to learn how to code
- The inputs are so immense, that Excel has a hard time loading them on an average computer
- Someone told me that agent based models are better than other models
- Only the outputs are important, and we need an exceptionally fast model
- It's easier for me to keep control the project if you can't understand the code
While silly, these reasons are not often strongly considered before a project starts.
© 2024 Forecast Health Australia. All rights reserved.
Terms of Service | Contact Us
Last updated: July 21, 2024
- AI and Modelling
- An Approach to Rewriting Models
- Excel is Good, Code is Bad
- Health Economics in a Time of Population Collapse
- Incrementalism as a Standard Approach
- Modelling is Advocacy, Not Science
- Models Should Be Mechanical
- Optimising Outcomes by Gradient Descent
- The Case for Open Modelling
- Theseus Robustness