Skip to content

Commit 044bb9d

Browse files
authored
Update text_process_operators.md
1 parent fb57e67 commit 044bb9d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/en/notes/guide/general_operators/text_process_operators.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@ DeitaQualityFilter:
240240
```
241241
You can set min/max scores and scorer parameters in `scorer_args` for filtering. For more information on supported scorers, refer to the [evaluation algorithm documentation](/en/guide/text_evaluation_operators/) (excluding the Diversity part).
242242

243-
In addition, heuristic rule filtering plays a significant role in the screening of pre-training data. In this regard, the [Dingo Data Quality Evaluation Tool](https://github.com/DataEval/dingo) has greatly inspired our development. We have integrated some of the rule filtering algorithms used in Dingo, a total of 22 types, into `dataflow/process/text/filters/heuristics.py`. For details, please refer to the [Rules Documentation](https://github.com/DataEval/dingo/blob/dev/docs/rules.md). The names of the filters can be found in the `dataflow/process/text/filters/heuristics.py` file.
243+
In addition, heuristic rule filtering plays a significant role in the screening of pre-training data. In this regard, the [Dingo Data Quality Evaluation Tool](https://github.com/DataEval/dingo) has greatly inspired our development. We have integrated some of the rule filtering algorithms used in Dingo, a total of 22 types, into `dataflow/operators/filter/GeneralText/heuristics.py`. For details, please refer to the [Rules Documentation](https://github.com/DataEval/dingo/blob/dev/docs/rules.md). The names of the filters can be found in the `dataflow/operators/filter/GeneralText/heuristics.py` file.
244244

245245

246246
All 42 data filters mentioned above share the same `yaml` invocation method.

0 commit comments

Comments
 (0)