Maybe it's just number of parameters, but number of computations (FLOPs) would be interesting. E.g. to make the point that I can use a simple set-up (linear model, no hidden layers, i.e. linear regression) or a complex one (3 hidden layers, loads of units per layer) to achieve exactly the same outcome.