You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/evaluation/README.md
+31-18Lines changed: 31 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,19 +7,15 @@ After you have trained an LCM, the checkpoint will be saved in a folder under th
7
7
Since an LCM expects input data in sentence level, we need to preprocess the evaluation datasets accordingly. This includes parsing the raw content and
8
8
splitting texts into sentences, then embedding them into vectors using a Sonar encoder.
9
9
10
-
The example below shows how we prepare the data for CNN Dailymail. We load the dataset from Huggingface using [`datasets` API](https://huggingface.co/docs/datasets/en/index). The sentence splitting is done using [wtpsplit](https://github.com/segment-any-text/wtpsplit). First, we install necessary libraries:
11
-
12
-
```shell
13
-
python -m pip install datasets wtpsplit
14
-
```
10
+
The example below shows how we prepare the data for CNN Dailymail. We load the dataset from Huggingface using [`datasets` API](https://huggingface.co/docs/datasets/en/index). The sentence splitting is done using [wtpsplit](https://github.com/segment-any-text/wtpsplit). Make sure to specify `--extra data` in installing the project to include these libraries.
15
11
16
12
All processing logic is implemented in the file `prepare_evaluation_data.py`, as described below.
17
13
18
14
### Step 1.1: Process the split:
19
15
Next, we download and parse the content (source text and summaries), saving different splits into JSON format
20
16
21
17
```shell
22
-
python prepare_evaluation_data.py prepare_data \
18
+
uv run --extra data prepare_evaluation_data.py prepare_data \
23
19
--dataset_name=cnn_dailymail \
24
20
--output_dir=jsonl_dataset \
25
21
--source_text_column=article \
@@ -41,16 +37,30 @@ The output will be stored in different files `[split].jsonl` under the directory
41
37
To perform sentence splitting and sonar embedding for each split, run the following command:
42
38
43
39
```shell
44
-
python prepare_evaluation_data.py embed \
40
+
uv run --extra data prepare_evaluation_data.py embed \
Depending on your machine, this might take some time. Alternatively, you can try to run in your SLURM cluster with the argmnent `--mode=slurm --shards=NO_OF_PARALLEL_JOBS`. This requires changing your SLURM config accordingly. We use [submitit](https://github.com/facebookincubator/submitit) to configure the job launcher. Here is the relevant excerpt in the script:
Note the missing parameters `source_text_column` and `target_text_column` and the new parameters `source_prefix_text`, `target_prefix_text`, since in this case, we do not modify the column schema, therefore the original text columns ("article", "highlights") are kept and not specified in the CLI.
134
+
> **_NOTE:_**the missing parameters `source_text_column` and `target_text_column` and the new parameters `source_prefix_text`, `target_prefix_text` are becase we do not modify the column schema. Therefore, the original text columns ("article", "highlights") are kept and not specified in the CLI.
125
135
126
136
It is also possible to provide the prompt from a YAML file. This is handy when you have to engineer the prompts carefully and have a very long detailed text. We provide one example prompt in the file [instruction.yaml](./instruction.yaml). The example command is:
@@ -168,13 +182,12 @@ Similar to LLM evaluation, it is possible to specify the prompt prefix and suffi
168
182
|`data_loading.batch_size`| Loading and evaluate data in batch. By default `batch_size=10`|
169
183
| `dataset_dir` | The directory consists of different JSONL files processed in Step 1. Only used in LLM evaluation
170
184
| `dataset.parquet_path` | The parquet path consists of different Parquet files files processed in Step 1. Only used in LCM evaluation
171
-
| `dataset.source_column` | The column in the data that refers to the input embedding. Not applicable when evaluating LLMs
172
-
| `dataset.source_text_column` | The column in the data that refers to the input text. Not applicable when evaluating LCMs
173
-
| `dataset.source_text_column` | The column in the data that refers to the input text. Not applicable when evaluating LCMs
174
-
| `dataset.target_column` | The column in the data that refers to the ground-truth embedding. Not applicable when evaluating LLMs
175
-
| `dataset.target_text_column` | The column in the data that refers to the ground-truth text. Not applicable when evaluating LCMs
185
+
| `dataset.source_column` | The column in the data that refers to the input embedding. Not applicable when evaluating LLMs.
186
+
| `dataset.source_text_column` | The column in the data that refers to the input text.
187
+
| `dataset.target_column` | The column in the data that refers to the ground-truth embedding. Not applicable when evaluating LLMs.
188
+
| `dataset.target_text_column` | The column in the data that refers to the ground-truth text.
176
189
| `dataset.source_text_prefix` | The text that will prepended to each input text to make the prompt for the model.
177
-
| `dataset.source_text_prefix` | The text that will appended after each input text to make the prompt for the model.
190
+
| `dataset.source_text_suffix` | The text that will appended after each input text to make the prompt for the model.
178
191
| `task_args` | The JSON-formatted string that represents the task arguments. See [task param list](#task_param_list) below.
179
192
| `dump_dir` | The directory consisting output of the eval run. If successful, there should be a file `metrics.eval.jsonl` that consists of metric results, the directory `results` that capture the verbose command line used with the detailed output scores, and the directory `raw_results` that shows
180
193
the model output for each individual sample, together with the per-sample metric results.
0 commit comments