Implement diacritics stripping and string normalization command#2329
Implement diacritics stripping and string normalization command#2329MaxKellermann merged 1 commit intoMusicPlayerDaemon:masterfrom
Conversation
src/song/StringFilter.hxx
Outdated
|
|
||
| bool GetFoldCase() const noexcept { | ||
| return fold_case; | ||
| return icu_compare; |
There was a problem hiding this comment.
Is this correct? If case folding is disabled but diacritics stripping is enabled, this will return true, won't it?
There was a problem hiding this comment.
Yup, I missed this one, thanks.
|
Lots and lots of build failures: And some bad coding style, uninitialized variables, .... sigh. |
|
Plus Windows build failures: Same build failure when building without ICU. When you fix a bad commit, don't add a fixup commit - instead, amend the existing known-bad commit. |
Compiles for me now both with and without ICU. I have fixed the tests and added a simple test case for diacritic stripping both with and without fold case as well.
I have squashed them into one since there was already more than one commit.
Hopefully should be fine now, but I do not have means to test a windows build atm.
Not much to go of off here, dev docs state to use spaces, I have tried to follow that as well as to copy the style elsewhere. Im going to need a bit more than "bad"
Which ones? |
Where does it say that?
You added new variables to class IcuCompare, but forgot to initialize them in the copy/move constructors. |
MaxKellermann
left a comment
There was a problem hiding this comment.
Just ... look at the diff and you see your indentation is all wrong.
src/command/ClientCommands.cxx
Outdated
| if (request.empty()) { | ||
| string_normalizations_print(client, r); | ||
| return CommandResult::OK; | ||
| } |
src/lib/icu/Canonicalize.cxx
Outdated
| #include <fmt/format.h> | ||
| #include <string_view> |
src/lib/icu/Canonicalize.cxx
Outdated
|
|
||
| strip_diacritics_transliterator = | ||
| new IcuTransliterator(ToStringView(std::span{UCharFromUTF8(strip_diacritics_id)}), | ||
| {}); |
src/lib/icu/Canonicalize.cxx
Outdated
|
|
||
| if (strip_diacritics) { | ||
| if (auto s = strip_diacritics_transliterator->Transliterate(ToStringView(std::span{u})); | ||
| s != nullptr) |
src/lib/icu/Compare.cxx
Outdated
| fold_case(_fold_case), | ||
| strip_diacritics(_strip_diacritics) {} |
src/lib/icu/Compare.hxx
Outdated
| bool fold_case; | ||
| bool strip_diacritics; |
src/lib/icu/Compare.hxx
Outdated
| bool GetFoldCase() const noexcept { | ||
| return needle != nullptr && fold_case; | ||
| } |
test/TestStringFilter.cxx
Outdated
| EXPECT_TRUE(f.Match("’")); | ||
| EXPECT_FALSE(f.Match("\"")); | ||
| EXPECT_TRUE(StringFilter("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư", true, true, StringFilter::Position::FULL, false) | ||
| .Match("aeiouycdenrstzuaaouaaaceoisugaoou")); |
| EXPECT_FALSE(StringFilter("ÁÉÍÓÚÝČĎĚŇŘŠŤŽŮÅÄÖÜÀÃÂÇÊÕÎŞÛĞĂÔƠƯ", false, true, StringFilter::Position::FULL, false) | ||
| .Match("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư")); | ||
| } | ||
|
|
test/TestStringFilter.cxx
Outdated
| EXPECT_TRUE(StringFilter("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư", true, true, StringFilter::Position::FULL, false) | ||
| .Match("aeiouycdenrstzuaaouaaaceoisugaoou")); | ||
| EXPECT_TRUE(StringFilter("ÁÉÍÓÚÝČĎĚŇŘŠŤŽŮÅÄÖÜÀÃÂÇÊÕÎŞÛĞĂÔƠƯ", true, true, StringFilter::Position::FULL, false) | ||
| .Match("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư")); |
GH Actions disagrees. |
ff0f465 to
52cb762
Compare
I mistyped in a haste, meant to say tabs, sorry.
My apologies, I have misconfigured my editor (only set tabwidth for one file) and my github was on 4 spaces so the diffs looked completely fine to me. Not an excuse, just an explanation. Hopefully they are fine now but tell me if you find any more.
I am a bit out of my league here since I am not very familiar with cpp in particular as I have said in the PR description. I have done my reading but am not completely sure I have done the implementation correctly so I would appreciate a feedback on them, thanks!
The GH build action now succeeded in my fork I had an extra pair of braces in there. |
src/lib/icu/Compare.hxx
Outdated
| IcuCompare(IcuCompare &&) = default; | ||
| IcuCompare &operator=(IcuCompare &&) = default; | ||
| IcuCompare(IcuCompare &&src) noexcept | ||
| :needle(src | ||
| ? AllocatedString(std::move(src.needle)) | ||
| : nullptr), | ||
| fold_case(src.fold_case), | ||
| strip_diacritics(src.strip_diacritics) | ||
| { | ||
| src.needle = nullptr; | ||
| src.fold_case = false; | ||
| src.strip_diacritics = false; | ||
| } | ||
|
|
||
| IcuCompare &operator=(IcuCompare &&src) noexcept { | ||
| needle = src | ||
| ? AllocatedString(std::move(src.needle)) | ||
| : nullptr; | ||
| fold_case = src.fold_case; | ||
| strip_diacritics = src.strip_diacritics; | ||
|
|
||
| src.needle = nullptr; | ||
| src.fold_case = false; | ||
| src.strip_diacritics = false; | ||
| return *this; | ||
| } |
There was a problem hiding this comment.
What's the point of this change?
There was a problem hiding this comment.
That is what I was asking about in the last comment. To me it seemed like default should already do exactly what I did here but you specifically pointed out that I did not initialize variables in the move constructor.
If this is indeed not needed then I can roll this one back.
There was a problem hiding this comment.
That comment was wrong, sorry - I meant the copy ctor+operator, not copy+move ctor.
There was a problem hiding this comment.
All good, I have rolled the change back.
Attempt to implement #2327
stringnormalizationprotocol commandsearchand the related commandsThe implementation works, but I am unsure whether I took the right approach (and its my first go at CPP). The threading through of booleans is also getting pretty gnarly in there. So I am mostly looking for feedback at this point. We can also workshop the
stringnormalizationname.