Skip to content

Comments

Implement diacritics stripping and string normalization command#2329

Merged
MaxKellermann merged 1 commit intoMusicPlayerDaemon:masterfrom
mierak:master
Aug 8, 2025
Merged

Implement diacritics stripping and string normalization command#2329
MaxKellermann merged 1 commit intoMusicPlayerDaemon:masterfrom
mierak:master

Conversation

@mierak
Copy link
Contributor

@mierak mierak commented Jun 28, 2025

Attempt to implement #2327

  • Introduces new stringnormalization protocol command
  • Implements first string normalization to strip diacritics when using search and the related commands

The implementation works, but I am unsure whether I took the right approach (and its my first go at CPP). The threading through of booleans is also getting pretty gnarly in there. So I am mostly looking for feedback at this point. We can also workshop the stringnormalization name.

@mierak mierak mentioned this pull request Jun 28, 2025
5 tasks

bool GetFoldCase() const noexcept {
return fold_case;
return icu_compare;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? If case folding is disabled but diacritics stripping is enabled, this will return true, won't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I missed this one, thanks.

@MaxKellermann
Copy link
Member

Lots and lots of build failures:

../../test/TestStringFilter.cxx: In member function ‘virtual void StringFilterTest_ASCII_Test::TestBody()’:
../../test/TestStringFilter.cxx:23:82: error: no matching function for call to ‘StringFilter::StringFilter(<brace-enclosed initializer list>)’
   23 |         const StringFilter f{"needle", false, StringFilter::Position::FULL, false};
      |                                                                                  ^

And some bad coding style, uninitialized variables, .... sigh.

@MaxKellermann
Copy link
Member

Plus Windows build failures:

../../src/lib/icu/Compare.cxx:34:1: error: no declaration matches 'IcuCompare::IcuCompare(std::string_view)'
   34 | IcuCompare::IcuCompare(std::string_view _needle) noexcept
      | ^~~~~~~~~~

Same build failure when building without ICU.

When you fix a bad commit, don't add a fixup commit - instead, amend the existing known-bad commit.

@mierak
Copy link
Contributor Author

mierak commented Jul 31, 2025

Lots and lots of build failures:
Same build failure when building without ICU.

Compiles for me now both with and without ICU. I have fixed the tests and added a simple test case for diacritic stripping both with and without fold case as well.

When you fix a bad commit, don't add a fixup commit - instead, amend the existing known-bad commit.

I have squashed them into one since there was already more than one commit.

Plus Windows build failures

Hopefully should be fine now, but I do not have means to test a windows build atm.

bad coding style

Not much to go of off here, dev docs state to use spaces, I have tried to follow that as well as to copy the style elsewhere. Im going to need a bit more than "bad"

uninitialized variables, .... sigh.

Which ones?

@MaxKellermann
Copy link
Member

bad coding style

Not much to go of off here, dev docs state to use spaces, I have tried to follow that as well as to copy the style elsewhere. Im going to need a bit more than "bad"

Where does it say that?

uninitialized variables, .... sigh.

Which ones?

You added new variables to class IcuCompare, but forgot to initialize them in the copy/move constructors.

Copy link
Member

@MaxKellermann MaxKellermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ... look at the diff and you see your indentation is all wrong.

if (request.empty()) {
string_normalizations_print(client, r);
return CommandResult::OK;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

Comment on lines 5 to 6
#include <fmt/format.h>
#include <string_view>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

????


strip_diacritics_transliterator =
new IcuTransliterator(ToStringView(std::span{UCharFromUTF8(strip_diacritics_id)}),
{});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent


if (strip_diacritics) {
if (auto s = strip_diacritics_transliterator->Transliterate(ToStringView(std::span{u}));
s != nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

Comment on lines 18 to 19
fold_case(_fold_case),
strip_diacritics(_strip_diacritics) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

Comment on lines 28 to 29
bool fold_case;
bool strip_diacritics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

Comment on lines 65 to 67
bool GetFoldCase() const noexcept {
return needle != nullptr && fold_case;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

EXPECT_TRUE(f.Match("’"));
EXPECT_FALSE(f.Match("\""));
EXPECT_TRUE(StringFilter("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư", true, true, StringFilter::Position::FULL, false)
.Match("aeiouycdenrstzuaaouaaaceoisugaoou"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

EXPECT_FALSE(StringFilter("ÁÉÍÓÚÝČĎĚŇŘŠŤŽŮÅÄÖÜÀÃÂÇÊÕÎŞÛĞĂÔƠƯ", false, true, StringFilter::Position::FULL, false)
.Match("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư"));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

EXPECT_TRUE(StringFilter("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư", true, true, StringFilter::Position::FULL, false)
.Match("aeiouycdenrstzuaaouaaaceoisugaoou"));
EXPECT_TRUE(StringFilter("ÁÉÍÓÚÝČĎĚŇŘŠŤŽŮÅÄÖÜÀÃÂÇÊÕÎŞÛĞĂÔƠƯ", true, true, StringFilter::Position::FULL, false)
.Match("áéíóúýčďěňřšťžůåäöüàãâçêõîşûğăôơư"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indent

@MaxKellermann
Copy link
Member

Plus Windows build failures

Hopefully should be fine now, but I do not have means to test a windows build atm.

GH Actions disagrees.

../src/lib/icu/Compare.cxx: In constructor 'IcuCompare::IcuCompare(std::string_view, bool, bool)':
../src/lib/icu/Compare.cxx:23:41: warning: unused parameter '_needle' [-Wunused-parameter]
   23 | IcuCompare::IcuCompare(std::string_view _needle, bool _fold_case, bool _strip_diacritics) noexcept
      |                        ~~~~~~~~~~~~~~~~~^~~~~~~
../src/lib/icu/Compare.cxx: At global scope:
../src/lib/icu/Compare.cxx:27:1: error: expected unqualified-id before '{' token
   27 | {
      | ^

@mierak mierak force-pushed the master branch 2 times, most recently from ff0f465 to 52cb762 Compare August 5, 2025 18:46
@mierak
Copy link
Contributor Author

mierak commented Aug 5, 2025

Where does it say that?

I mistyped in a haste, meant to say tabs, sorry.

Just ... look at the diff and you see your indentation is all wrong.

My apologies, I have misconfigured my editor (only set tabwidth for one file) and my github was on 4 spaces so the diffs looked completely fine to me. Not an excuse, just an explanation. Hopefully they are fine now but tell me if you find any more.

You added new variables to class IcuCompare, but forgot to initialize them in the copy/move constructors.

I am a bit out of my league here since I am not very familiar with cpp in particular as I have said in the PR description. I have done my reading but am not completely sure I have done the implementation correctly so I would appreciate a feedback on them, thanks!
The bools are initialized in the copy constructors and the move constructor as well and I have attempted to default the source in the latter.

GH Actions disagrees.

The GH build action now succeeded in my fork I had an extra pair of braces in there.

@mierak mierak requested a review from MaxKellermann August 5, 2025 19:58
Comment on lines 46 to 75
IcuCompare(IcuCompare &&) = default;
IcuCompare &operator=(IcuCompare &&) = default;
IcuCompare(IcuCompare &&src) noexcept
:needle(src
? AllocatedString(std::move(src.needle))
: nullptr),
fold_case(src.fold_case),
strip_diacritics(src.strip_diacritics)
{
src.needle = nullptr;
src.fold_case = false;
src.strip_diacritics = false;
}

IcuCompare &operator=(IcuCompare &&src) noexcept {
needle = src
? AllocatedString(std::move(src.needle))
: nullptr;
fold_case = src.fold_case;
strip_diacritics = src.strip_diacritics;

src.needle = nullptr;
src.fold_case = false;
src.strip_diacritics = false;
return *this;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this change?

Copy link
Contributor Author

@mierak mierak Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I was asking about in the last comment. To me it seemed like default should already do exactly what I did here but you specifically pointed out that I did not initialize variables in the move constructor.
If this is indeed not needed then I can roll this one back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was wrong, sorry - I meant the copy ctor+operator, not copy+move ctor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, I have rolled the change back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants