Skip to content

Excel is good code is bad

Rory Watts edited this page Jul 21, 2024 · 1 revision

Moving from Excel imposes very large costs

In health, when making models, there are three main options you have: Excel, Code and Something Else. The mistake I see modellers making, is moving from Excel to Code without very good reasons.

What is Code?

We are all familiar with Excel, so I don't need to explain it, but allow me to make a comparison. Here's the world's simplest Excel Model, showing the cost of treating incident cases of some disease each year. image

Although you can't see the calculations directly, you can sense what is going on: we're multiplying values like population, against rates and costs. Here's equivalent code, in Python, might be something like this:

# Define Constants
population = 25_000_000
incidence_rate = 0.005
treatment_rate = 0.33
cost_of_treatment = 133.6

annual_incidence = population * incidence_rate
treated = annual_incidence * treatment_rate
cost_of_treatment = treated * cost_of_treatment

print(f"The Cost of Treatment is: {cost_of_treatment}")

If this code was written in a file called simple_model.py, we could run it, and get a response:

python simple_model.py
> The Cost of Treatment is: 5511000.00

Great! This is very simple, and we can clearly see what is going on. We can now call ourselves "programmers" too, an added bonus. And now you - the reader - has also read and correctly interpreted Python code too, well done. Now you basically understand what code is, for our purposes at least: code is some set of instructions for the computer, which can hold values (as variables, e.g. population) and make calculations. It takes inputs, and produces outputs.

Code is generally less interpretable

Now, let me obfuscate the same script we wrote above:

from functools import reduce

def calc_cost(pop, inc_rate, trt_rate, cost_trt):
    return reduce(lambda x, y: x + y, [pop, inc_rate, trt_rate, cost_trt])

if __name__ == '__main__':
    POPULATION = 25_000_000
    INCIDENCE_RATE = 0.005
    TREATMENT_RATE = 0.33
    COST_OF_TREATMENT = 133.6

    cost = calc_cost(POPULATION, INCIDENCE_RATE, TREATMENT_RATE, COST_OF_TREATMENT)
    print(f"The Cost of Treatment is: {cost}")

This script does exactly the same thing as above. We can still make out the constants (e.g. POPULATION, INCIDENCE_RATE), but what on earth is that calc_cost function? What is it doing? It's now moved from something we can all understand, to something that the programmer can understand. Meanwhile, even when Excel is confusing, it's still reasonably straightforward to trace the steps that produced the output.

Even more importantly, I introduced a mistake into that code above. It now just adds the values together, rather than multiplying them. But because we haven't written any tests for the code, we may not notice it immediately. So, what did we gain from writing out such an eloquent piece of code?

Why do people move from Excel to Code?

Here's a list of reasons that modellers may want to move from Excel to Code Some are reasonable, some are clearly unreasonable.

  • The model is so complex, that the visual design of Excel is too restrictive
  • Excel is boring, and I want to learn how to code
  • The inputs are so immense, that Excel has a hard time loading them on an average computer
  • Someone told me that agent based models are better than other models
  • Only the outputs are important, and we need an exceptionally fast model
  • It's easier for me to keep control the project if you can't understand the code

While silly, these reasons are not often strongly considered before a project starts.

Clone this wiki locally