Implementing group 3 noun rules for Serbian. by nciric · Pull Request #173 · unicode-org/inflection

nciric · 2025-08-12T03:35:05Z

Fully implements group 3 (all nouns ending with -a).

This removes a need for a large number of nouns to be added to Wikidata.

Resolves part of #172 .

inflection/src/inflection/grammar/synthesis/SrGrammarSynthesizer_SrDisplayFunction.cpp

grhoten · 2025-08-15T16:14:58Z

inflection/src/inflection/grammar/synthesis/SrGrammarSynthesizer_SrDisplayFunction.cpp

+    static constexpr auto suffix_sg = ::std::to_array<::std::u16string_view>({u"а", u"е", u"и", u"у", u"а", u"ом", u"и"});
+    static constexpr auto suffix_pl = ::std::to_array<::std::u16string_view>({u"е", u"а", u"ама", u"е", u"е", u"ама", u"ама"});
+
+    ::std::u16string base = lemma;
+    // Remove trailing a and apply suffix.
+    base.pop_back();
+    base = applySuffix(base, suffix_sg, suffix_pl, number, targetCase);


For this kind of mapping, you may be inspired by Arabic, German or Italian. They convert a string to a numeric key (makeLookupKey) containing multiple grammemes, and they map the key to a string. This mapping is initialized in the constructor instead of at runtime.

Is the concern the runtime size increase (static constexpr)? If yes, I can remove the static (creating these arrays is cheap).
Otherwise the current approach looks simpler. I will look into refactoring this code as I add more cases, potentially implementing Arabic like approach.

WDYT?

It was to make it more scalable, but this is fine too.

inflection/src/inflection/grammar/synthesis/SrGrammarSynthesizer_SrDisplayFunction.cpp

grhoten

Changes look fine. Optional comments to consider where also provided.

inflection/src/inflection/grammar/synthesis/SrGrammarSynthesizer_SrDisplayFunction.cpp

grhoten · 2025-08-19T05:46:06Z

inflection/test/resources/inflection/dialog/inflection/sr.xml

+    <test><source case="genitive" number="plural" gender="feminine" pos="noun">конзерва</source><result>конзерви</result></test>
+    <test><source case="genitive" number="plural" gender="feminine" pos="noun">гошћа</source><result>гошћа</result></test>
+    <test><source case="genitive" number="plural" gender="feminine" pos="noun">двојка</source><result>двојака</result></test>
+    <test><source case="genitive" number="plural" gender="feminine" pos="noun">битка</source><result>битака</result></test>


You have a lot of fully fleshed out constraints. Most of the other languages only change specific grammemes. Sometimes you only specify the case, number or gender. The other tests usually specify less. The other languages usually default to noun. These tests are currently fine, but common usage starts from any surface form (ideally a unique surface form), and then you modify just the relevant grammemes.

I can probably remove noun info, but I do need a case, gender and number for rule based approach to work.
I also assume nominative input (lemma) - otherwise the rules would be more complex, or would need dictionary support to implement them.

grhoten · 2025-08-19T05:47:40Z

inflection/src/inflection/grammar/synthesis/SrGrammarSynthesizer_SrDisplayFunction.cpp

+    static constexpr auto suffix_sg = ::std::to_array<::std::u16string_view>({u"а", u"е", u"и", u"у", u"а", u"ом", u"и"});
+    static constexpr auto suffix_pl = ::std::to_array<::std::u16string_view>({u"е", u"а", u"ама", u"е", u"е", u"ама", u"ама"});
+
+    ::std::u16string base = lemma;
+    // Remove trailing a and apply suffix.
+    base.pop_back();
+    base = applySuffix(base, suffix_sg, suffix_pl, number, targetCase);


It was to make it more scalable, but this is fine too.

Implementing group 3 noun rules for Serbian.

64e1fd0

nciric requested a review from grhoten August 12, 2025 03:35

nciric added 4 commits August 12, 2025 04:09

Convert to_array to manuall initialization bcs MacOS.

097fda4

Using u16string_view to avoid allocation in constexpr

9be78bb

Fix spelling

1f37045

Replace regex with simple loop for perfomance reasons.

49295e3

grhoten reviewed Aug 15, 2025

View reviewed changes

nciric added 3 commits August 15, 2025 19:02

Use [[maybe_unused]] on unused parameters.

0a365cc

Remove uneccessary includes, optimize suffix handling code.

67ccfcc

Add isConsontant/Vowel functions and simplify code with them.

b857505

grhoten approved these changes Aug 19, 2025

View reviewed changes

Enable inflection guess check.

78062a9

nciric merged commit 60043f1 into main Aug 19, 2025
7 checks passed

nciric deleted the cira-sr branch August 19, 2025 20:06

nciric restored the cira-sr branch August 19, 2025 20:13

nciric deleted the cira-sr branch August 19, 2025 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementing group 3 noun rules for Serbian.#173

Implementing group 3 noun rules for Serbian.#173
nciric merged 9 commits intomainfrom
cira-sr

nciric commented Aug 12, 2025

Uh oh!

Uh oh!

grhoten Aug 15, 2025

Uh oh!

nciric Aug 15, 2025

Uh oh!

grhoten Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grhoten left a comment

Uh oh!

Uh oh!

grhoten Aug 19, 2025

Uh oh!

nciric Aug 19, 2025

Uh oh!

grhoten Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nciric commented Aug 12, 2025

Uh oh!

Uh oh!

grhoten Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

nciric Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

grhoten Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grhoten left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grhoten Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

nciric Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

grhoten Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants