Incomplete humaneval evaluation code

How to reproduce the humaneval performance in this repo ? 
I tried to evaluate using the evaluation branch, but it seems to be very different from the the default branch, is it possible to add a full evaluation process in default branch ?
Thanks a lot.