-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
feature requestFeature requestFeature request
Description
What problem are you trying to solve?
Currently, the cleaner functions do not consider two strings similar if they have different Harakat/diacritics, which is the correct behavior.
However, it would be great if the user had the option to ignore Harakat when comparing strings.
Examples (if relevant)
Current:
>> from maha.cleaners.functions import remove
>> output = remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة")
>> output
يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى
Suggested:
>> from maha.cleaners.functions import remove
>> remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة", ignore_harakat=True)
>> output
يُدَرِّسُ العَرَبِيَّةَ الفُصْحَى
Definition of Done
- It must adhere to the coding style used in the defined cleaner functions.
- The implementation should cover most use cases.
- Adding tests
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestFeature requestFeature request