-In addition, heuristic rule filtering plays a significant role in the screening of pre-training data. In this regard, the [Dingo Data Quality Evaluation Tool](https://github.com/DataEval/dingo) has greatly inspired our development. We have integrated some of the rule filtering algorithms used in Dingo, a total of 22 types, into `dataflow/process/text/filters/heuristics.py`. For details, please refer to the [Rules Documentation](https://github.com/DataEval/dingo/blob/dev/docs/rules.md). The names of the filters can be found in the `dataflow/process/text/filters/heuristics.py` file.
0 commit comments