Currently, we define an error_margin as a fixed percentage. We don't know how stable the tests are, and for e.g. what the expected standard deviation is.
It would be more stable (and informative) to run the tests multiple times, and collect relevant stats like min/max/median/stdev.