|
| 1 | +# **Evaluation Pipeline** |
| 2 | + |
| 3 | +Only supports QA pair format evaluation |
| 4 | + |
| 5 | +## **Quick Start** |
| 6 | + |
| 7 | +``` |
| 8 | +cd DataFlow |
| 9 | +pip install -e .[llamafactory] |
| 10 | +
|
| 11 | +cd .. |
| 12 | +mkdir workspace |
| 13 | +cd workspace |
| 14 | +
|
| 15 | +#Place the files you want to evaluate in the working directory |
| 16 | +
|
| 17 | +#Initialize the evaluation configuration file |
| 18 | +dataflow eval init |
| 19 | +
|
| 20 | +#Note: You must modify the configuration file eval_api.py or eval_local.py |
| 21 | +#By default, it finds the latest fine-tuned model and compares it with its base model |
| 22 | +#Default evaluation method is semantic evaluation |
| 23 | +#Evaluation metric is accuracy |
| 24 | +dataflow eval api / dataflow eval local |
| 25 | +``` |
| 26 | + |
| 27 | +## **Step 1: Install Evaluation Environment** |
| 28 | + |
| 29 | +Download evaluation environment |
| 30 | + |
| 31 | +``` |
| 32 | +cd DataFlow |
| 33 | +pip install -e .[llamafactory] |
| 34 | +cd .. |
| 35 | +``` |
| 36 | + |
| 37 | + |
| 38 | + |
| 39 | +## **Step 2: Create and Enter dataflow Working Folder** |
| 40 | + |
| 41 | +``` |
| 42 | +mkdir workspace |
| 43 | +cd workspace |
| 44 | +``` |
| 45 | + |
| 46 | + |
| 47 | + |
| 48 | +## **Step 3: Prepare Evaluation Data and Initialize Configuration File** |
| 49 | + |
| 50 | +Initialize configuration file |
| 51 | + |
| 52 | +``` |
| 53 | +dataflow eval init |
| 54 | +``` |
| 55 | + |
| 56 | +After initialization is complete, the project directory becomes: |
| 57 | + |
| 58 | +``` |
| 59 | +Project Root Directory/ |
| 60 | +├── eval_api.py # Configuration file for API model evaluator |
| 61 | +└── eval_local.py # Configuration file for local model evaluator |
| 62 | +``` |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +## **Step 4: Prepare Evaluation Data** |
| 67 | + |
| 68 | +Initialize configuration file |
| 69 | + |
| 70 | +``` |
| 71 | +dataflow eval init |
| 72 | +``` |
| 73 | + |
| 74 | +After initialization is complete, the project directory becomes: |
| 75 | + |
| 76 | +``` |
| 77 | +Project Root Directory/ |
| 78 | +├── eval_api.py # Configuration file for API model evaluator |
| 79 | +└── eval_local.py # Configuration file for local model evaluator |
| 80 | +``` |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | +**Method 1:** Please prepare a JSON format file with data format similar to the example shown |
| 85 | + |
| 86 | +Please prepare a JSON format file with data format similar to the display |
| 87 | + |
| 88 | +``` |
| 89 | +{ |
| 90 | + "input": "What properties indicate that material PI-1 has excellent processing characteristics during manufacturing processes?", |
| 91 | + "output": "Material PI-1 has high tensile strength between 85-105 MPa.\nPI-1 exhibits low melt viscosity below 300 Pa·s indicating good flowability.\n\nThe combination of its high tensile strength and low melt viscosity indicates that it can be easily processed without breaking during manufacturing." |
| 92 | +}, |
| 93 | +``` |
| 94 | + |
| 95 | +In this example data |
| 96 | + |
| 97 | +`input` is the question |
| 98 | + |
| 99 | +`output` is the standard answer |
| 100 | + |
| 101 | + |
| 102 | + |
| 103 | +**Method 2:** You can also skip data processing (requires clear question and standard answer fields), and configure field name mappings through eval_api.py and eval_local.py |
| 104 | + |
| 105 | +``` |
| 106 | +EVALUATOR_RUN_CONFIG = { |
| 107 | + "input_test_answer_key": "model_generated_answer", # Model generated answer field name |
| 108 | + "input_gt_answer_key": "output", # Standard answer field name (corresponding to original data) |
| 109 | + "input_question_key": "input" # Question field name (corresponding to original data) |
| 110 | +} |
| 111 | +``` |
| 112 | + |
| 113 | + |
| 114 | + |
| 115 | +## **Step 5: Configure Parameters** |
| 116 | + |
| 117 | +If you want to use a local model as the evaluator, please modify the parameters in the `eval_local.py` file If you want to use an API model as the evaluator, please modify the parameters in the `eval_api.py` file |
| 118 | + |
| 119 | +``` |
| 120 | +Target Models Configuration (same as API mode) |
| 121 | +
|
| 122 | +TARGET_MODELS = [ |
| 123 | + # Shows all usage methods |
| 124 | + # The following methods can be mixed and used together |
| 125 | + # 1. Local path |
| 126 | + # "./Qwen2.5-3B-Instruct", |
| 127 | + # 2. Huggingface path |
| 128 | + # "Qwen/Qwen2.5-7B-Instruct" |
| 129 | + # 3. Individual configuration |
| 130 | + # Add more models... |
| 131 | + # { |
| 132 | + # "name": "llama_8b", |
| 133 | + # "path": "meta-llama/Llama-3-8B-Instruct", |
| 134 | + # "tensor_parallel_size": 2 |
| 135 | + # "max_tokens": 2048, |
| 136 | + # "gpu_memory_utilization": 0.9, |
| 137 | + # You can customize prompts for each model. If not specified, defaults to the template in build_prompt function. |
| 138 | + # Default prompt for evaluated models |
| 139 | + # IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!! |
| 140 | + # "answer_prompt": """please answer the questions: |
| 141 | + # question:{question} |
| 142 | + # answer:""" |
| 143 | + # "" |
| 144 | + # } |
| 145 | + # |
| 146 | + |
| 147 | +] |
| 148 | +``` |
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | +## **Step 6: Perform Evaluation** |
| 153 | + |
| 154 | +Run local evaluation |
| 155 | + |
| 156 | +``` |
| 157 | +dataflow eval local |
| 158 | +``` |
| 159 | + |
| 160 | +Run API evaluation |
| 161 | + |
| 162 | +``` |
| 163 | +dataflow eval api |
| 164 | +``` |
0 commit comments