Skip to content

Conversation

@btangmu
Copy link
Member

@btangmu btangmu commented Dec 4, 2024

-Each CLDRFile has a NameGetter; all getName methods moved to NameGetter

-Give GetName methods distinct names for clarify and to avoid excessive overloading

-Make methods private where possible

-Move some calls to nameGetter() out of loops, re-use

-Remove some dead code

-Comments

CLDR-15830

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

-Each CLDRFile has a NameGetter; all getName methods moved to NameGetter
-All three CLDRFile constructors need to initialize NameGetter
-Further renaming for clarity

-Make methods private where possible

-Move some calls to nameGetter() out of loops, re-use

-Remove some dead code

-Comments
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had a few comments.

*/
public NameGetter(CLDRFile cldrFile) {
if (cldrFile == null) {
throw new RuntimeException("NameGetter must have non-null CLDRFile");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalArgumentException is more specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

this.cldrFile = cldrFile;
}

private static final String GETNAME_LOCALE_SEPARATOR =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might put PATH in the name, also for similar ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

* @return the name
*/
public String getNameFromTypestrCode(String type, String code) {
return getNameFromTypenumCode(CLDRFile.typeNameToCode(type), code);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you get this merged, a second refactoring is to inline this and similar ones. 'Typenum' is hard to read; TypeEnum is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll address that in a subsequent PR. By "inline", do you mean, for example, to change getNameFromTypestrCode("territory", territory) to getNameFromTypenumCode(CLDRFile.TERRITORY_NAME, territory)? And then subsequently (or in the same PR) change CLDRFile.TERRITORY_NAME, etc., to an enum?

* @param code a code such as "JP"
* @return the name
*/
public String getNameFromTypenumCode(int type, String code) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also get rid of the CLDR fake enums like CLDRFile.TERRITORY_NAME in favor of enums. This is probably easier in a second PR, later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; see above comment

* @return the name of the given bcp47 identifier
*/
public synchronized String getNameFromBCP47(String localeOrTZID) {
return getNameFromBCP47Bool(localeOrTZID, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we are writing new code, best to avoid having a boolean in any API, and rather use a meaningful enum. Again, could be done in later PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll include that in a subsequent PR

-Make CLDRFile.nameGetter final

-Remove 2 unused constants in CLDRFile

-Change RuntimeException to IllegalArgumentException in NameGetter constructor

-Rename 3 constants to end in _PATH

-In TestUtilities.printCountries call nameGetter() once instead of repeatedly
@btangmu btangmu marked this pull request as ready for review December 6, 2024 15:34
@btangmu btangmu requested review from macchiati and srl295 December 6, 2024 15:34
@btangmu btangmu changed the title Refactor getName: new class NameGetter CLDR-15830 Refactor getName: new class NameGetter Dec 6, 2024
@btangmu
Copy link
Member Author

btangmu commented Dec 6, 2024

I've made this ready for review. So far this is fairly trivial refactoring but affects many files, so better not to keep changing on the same PR

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK but the naming is awkward. Many of the functions could have the same name, just with different overloaded parameters. A builder (fluent) API model might work better, to mutate a set of options rather than have a confusing and similar set of functions to choose from. BCP47 does not have @ signs in it, so a different name should be used.

I'm approving this because it's an important refactor.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class NameGetter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is a little awkward, isn't it a display name formatter?

Perhaps the naming should parallel ICU4J at least somewhat. https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/LocaleDisplayNames.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it a display name formatter?

Good question! Is that what it is?

BCP47 does not have @ signs in it, so a different name should be used.

That sounds important. Some of the methods already had comments saying parameters were bcp47 identifiers, and I carried that over to the related methods (callers/callees). If that's inaccurate, what would be a more accurate description? Part of the problem with this whole module is that there are strings and integers coming in without a clear explanation of what strings and integers are actually expected/allowed, and that's why Mark and I have been discussing an Enum instead of strings and integers, wherever that's possible...

* @param code a code such as "JP"
* @return the name
*/
public String getNameFromTypestrCode(String type, String code) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe forTypeAndCode() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, eventually we might not need/want to keep "getName" in the method names


/**
* Returns the name of the given bcp47 identifier. Note that extensions must be specified using
* the old "\@key=type" syntax.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it's not BCP47!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is just moved here from CLDRFile; I haven't really written any new code other than the glue between CLDRFile and NameGetter. Is the given identifier really "a bcp47 identifier or an extension thereof"? Pardon my ignorance!

return getNameFromBCP47BoolAlt(localeOrTZID, onlyConstructCompound, null);
}

public String getNameFromParserBool(LanguageTagParser lparser, boolean onlyConstructCompound) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be an overload, forLocale() taking a LanguageTagParser as a et

null);
}

public synchronized String getNameFromBCP47Etc(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etc is a little awkward, a Builder model might be better.

* @param paths if non-null, fillin with contributory paths
* @return the name of the given bcp47 identifier
*/
private synchronized String getNameFromManyThings(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, a builder model that sets up options might be better than a kitchen sink set of parameters.

new NameFormatter(locale).onlyConstructCompound(true).withLocalePatter(pattern)…

@btangmu
Copy link
Member Author

btangmu commented Dec 6, 2024

I think it's OK but the naming is awkward. Many of the functions could have the same name, just with different overloaded parameters...

Steven, one reason for this refactor is that it was difficult to see how any particular version of getName() was actually used. With current main, IntelliJ reports getName "has 347 callers". Probably there's a way to find callers without including overloaded versions, but anyway 14 different methods in the same class with the same name is just too much for my little brain to work with. Refactoring is often easier if you can use global search-and-replace; methods with identical names get in the way of that.

Another reason for the refactor is that these methods were tangled up in the middle of CLDRFile which is just too large for maintainable code.

I agree that the names are somewhat awkward or arbitrary, but for me at least they give me a fighting chance to make sense what's calling what. After the code is reasonably clean, a nice finishing touch might be to restore some overloading, within reason.

I complete agree with you that some of these methods have a "kitchen sink of parameters" -- best practice is that no method should have more than about 4 parameters. A possible solution is to make them members of a class, and I suspect that's what you mean by a Builder model.

I can still hardly understand half of what's going on in this new class, so I can only revise so much at a time :-)

@srl295
Copy link
Member

srl295 commented Dec 6, 2024

I think it's OK but the naming is awkward. Many of the functions could have the same name, just with different overloaded parameters...

Steven, one reason for this refactor is that it was difficult to see how any particular version of getName() was actually used. With current main, IntelliJ reports getName "has 347 callers". Probably there's a way to find callers without including overloaded versions, but anyway 14 different methods in the same class with the same name is just too much for my little brain to work with. Refactoring is often easier if you can use global search-and-replace; methods with identical names get in the way of that.

That's a good point for that analysis. Since it's a library, I think it's harder to get this kind of analysis otherwise - SurveyTool might call it deep inside something for example (cross library), so "function is unused" warnings don't help.

Another reason for the refactor is that these methods were tangled up in the middle of CLDRFile which is just too large for maintainable code.

💯

I agree that the names are somewhat awkward or arbitrary, but for me at least they give me a fighting chance to make sense what's calling what. After the code is reasonably clean, a nice finishing touch might be to restore some overloading, within reason.

sounds good… 

I complete agree with you that some of these methods have a "kitchen sink of parameters" -- best practice is that no method should have more than about 4 parameters. A possible solution is to make them members of a class, and I suspect that's what you mean by a Builder model.

At leisure take a look at https://www.baeldung.com/java-fluent-interface-vs-builder-pattern I think it's the Builder model (not fluent, though they have similarities). Yes, you'd have basically a Settings-type object that gets modified by calling functions on it (which you could chain) and then it ends up with a toString() call when you're ready to output it as text.

I can still hardly understand half of what's going on in this new class, so I can only revise so much at a time :-)

I'm with you! But it's a good refactor—— and Bottom line is that's why I hit Approve.

@btangmu btangmu merged commit d4fffd7 into unicode-org:main Dec 6, 2024
9 checks passed
@btangmu btangmu deleted the t15830_a branch December 6, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants