You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/data-and-cleaning/index.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ See examples in the directory.
41
41
42
42
### Default configs
43
43
44
-
Set `opuscleaner-mode: custom` in the training config to use custom per-dataset and per-language pair configs.
44
+
Set `opuscleaner-mode: custom`(this is the default when generating a config) in the training config to use custom per-dataset and per-language pair configs.
45
45
46
46
If no custom config was specified for the dataset,
47
47
the [default config template](https://github.com/mozilla/translations/tree/main/pipeline/clean/opuscleaner/configs/default.filters.json) will be used.
@@ -58,6 +58,9 @@ The config is chosen based on this search order:
58
58
59
59
The first found config will be applied.
60
60
61
+
If the desired behaviour is to apply only the default config template and skip all possible custom configs
62
+
for the current language pair and/or datasets, set `opuscleaner-mode: defaults`.
63
+
61
64
## Bicleaner
62
65
63
66
It is recommended to use Bicleaner ML models to filter noisy data.
0 commit comments