there is a bug in manualRemove_dataset_HumanEvalComm_model_deepseek-coder-6.7b-instruct-finetuned-02212025_topn_1_temperature_1.0.log_1 and other deepseek model and codellama model, that after first run is executed, there will be two or more additional runs added.
Although current it does not affect eval results (confirmed), but we need to investigate why additional runs are added and these runs are not similar to the first runs.
One hypothesis is that results are added to log_1 in an unexpected way when running jobs outside phase 1