Skip to content

Add the option to ignore Harakat when removing or replacing #104

@xaleel

Description

@xaleel

What problem are you trying to solve?

Currently, the cleaner functions do not consider two strings similar if they have different Harakat/diacritics, which is the correct behavior.
However, it would be great if the user had the option to ignore Harakat when comparing strings.

Examples (if relevant)

Current:

>> from maha.cleaners.functions import remove
>> output = remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة")
>> output
يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى

Suggested:

>> from maha.cleaners.functions import remove
>> remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة", ignore_harakat=True)
>> output
يُدَرِّسُ العَرَبِيَّةَ الفُصْحَى

Definition of Done

  • It must adhere to the coding style used in the defined cleaner functions.
  • The implementation should cover most use cases.
  • Adding tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions