fix: modify getTextLength to use BreakIterator for grapheme count#499
fix: modify getTextLength to use BreakIterator for grapheme count#499Pranavjeet-Naidu wants to merge 3 commits intoopenzim:mainfrom
Conversation
veloman-yunkan
left a comment
There was a problem hiding this comment.
This fix begs a unit test. Therefore the PR should be implemented as follows:
Commit 1: move the function getTextLength() to src/tools.cpp and declare it in src/tools.h.
Commit 2: add a unit test in test/tools-test.cpp. It must include a test-point reflecting the bug being addressed.
Commit 3: fix the bug
|
@veloman-yunkan, the three commits like you've asked are up, once you give a go ahead, I will squash them as well. |
veloman-yunkan
left a comment
There was a problem hiding this comment.
@veloman-yunkan, the three commits like you've asked are up, once you give a go ahead, I will squash them as well.
The commits must not be squashed. The PR must consist of the said three commits at the time of merge. Now there are four commits.
Please also don't abuse the "fix:" label in the commit title. The commit message must not be misleading - adding a unit test is not a fix. Also it must be informative. For example, if a new unit test fails because it demonstrates a pre-existing bug, that must be stated in the commit description.
Signed-off-by: Pranavjeet-Naidu <pranavjeetnaidu@gmail.com>
The new test cases include grapheme clusters (e.g., emoji with skin tone modifiers) which demonstrate bug openzim#498 - the current implementation counts code points instead of graphemes. Signed-off-by: Pranavjeet-Naidu <pranavjeetnaidu@gmail.com>
b25563f to
486325e
Compare
Use ICU's BreakIterator to properly count grapheme clusters. This fixes cases where characters like emoji with skin tone modifiers (e.g., 👍) were incorrectly counted as multiple characters. - Update build configuration to include ICU dependency for tools.cpp - Add tools.cpp to metadata-test dependencies Fixes openzim#498 Signed-off-by: Pranavjeet-Naidu <pranavjeetnaidu@gmail.com>
486325e to
834ff17
Compare
brings up a fix for #498.
Used ICU's BreakIterator, took the definition code from their documentation + added a status check as well.
Please suggest changes if needed, considering this is my first time working with this.