Running the automatic evaluation is not interactive and not very interesting. Run only at beginning (or at end?) to motivate manual evaluation.