Skip to content

Conversation

btangmu
Copy link
Member

@btangmu btangmu commented Sep 17, 2025

-In PathDescriptions.md and PathDescriptionHints.md, use shorter forms of paths, omit ldml, no anchors; use variable anyAttribute instead of raw regex

-Store variables in the .md files; new VARIABLES section, new methods inVariables, addVariable

-In PathDescriptionParser, call lookup.add instead of lookup.addWithoutVariables

-Use lazy initialization for pathHandling and pathHintsHandling, to facilitate debugging

-Revise TestPathHeader.Test8414 to check for brackets instead of http, since URLs moved to bottom of file

-Revise PathDescriptionParser.parse to take file name instead of big string as parameter, so error message has the correct file name

-Make parts of PathDescription.java private if they do not need to be public

CLDR-17599

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

-In PathDescriptions.md and PathDescriptionHints.md, use shorter forms of paths, omit ldml, no anchors; use variable anyAttribute instead of raw regex

-Store variables in the .md files; new VARIABLES section, new methods inVariables, addVariable

-In PathDescriptionParser, call lookup.add instead of lookup.addWithoutVariables

-Use lazy initialization for pathHandling and pathHintsHandling, to facilitate debugging

-Revise TestPathHeader.Test8414 to check for brackets instead of http, since URLs moved to bottom of file

-Revise PathDescriptionParser.parse to take file name instead of big string as parameter, so error message has the correct file name

-Make parts of PathDescription.java private if they do not need to be public
@btangmu btangmu self-assigned this Sep 17, 2025
@btangmu btangmu requested a review from macchiati September 17, 2025 12:53
## HINT descriptions

### HINT

- `^//ldml/units/unitLength\[@type="(long|narrow|short)"\]/unit\[@type="duration-day"\]/(displayName|unitPattern\[@count="one"\]|unitPattern\[@count="other"\])$`
- `units/unitLength\[@type="%unitLengths"\]/unit\[@type="duration-day"\](/displayName|/unitPattern\[@count="%anyAttribute"\])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to add all variables before they are encountered?

## HINT descriptions

### HINT

- `^//ldml/units/unitLength\[@type="(long|narrow|short)"\]/unit\[@type="duration-day"\]/(displayName|unitPattern\[@count="one"\]|unitPattern\[@count="other"\])$`
- `units/unitLength\[@type="%unitLengths"\]/unit\[@type="duration-day"\](/displayName|/unitPattern\[@count="%anyAttribute"\])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the short forms, you should be removing the '' in front of each [

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you mean removing the backslash \ (which github hides unless escaped).

I tried removing backslash in front of both left and right brackets, and exceptions get thrown from exampleGenerator.getHelpHtml. The first such error occurs for

units/unitLength[@type="%anyAttribute"]/unit[@type="duration-day"]/displayName

In order to use the "short forms" do I need to create the RegexLookup differently? Currently it's

    private final RegexLookup<Pair<String, String>> lookup =
            new RegexLookup<>(RegexLookup.LookupType.OPTIMIZED_DIRECTORY_PATTERN_LOOKUP);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one error:

Caused by: java.util.regex.PatternSyntaxException: Illegal character range near index 61
units/unitLength[@type="%anyAttribute"]/unit[@type="duration-day"]/displayName
                                                             ^
	at java.base/java.util.regex.Pattern.error(Pattern.java:2028)

It thinks the hyphen in "duration-day" is part of a regex

@srl295
Copy link
Member

srl295 commented Sep 18, 2025

use shorter forms of paths, omit ldml

Why is ^//ldml/ dropped? won't the substring form be less performant as well?

no anchors

Actually, only redundant HINT anchors were removed. The design was to be able to name the sections for maintenance.

@srl295
Copy link
Member

srl295 commented Sep 18, 2025

LGTM otherwise but questions

@macchiati
Copy link
Member

When building a regex lookup, you have the option of supplying simpler paths. It just handles some of the quoting so that you don't have to explicitly escape the leading sq brackets for an attribute, and adds a prefix. Done at RL build time so no apprec. Perf cost

@macchiati
Copy link
Member

PathHeader.java shows the right incantation (and PathHeader.txt shows it in action).

    static final RegexLookup<RawData> lookup =
            RegexLookup.of(new PathHeaderTransform())
                    .setPatternTransform(RegexLookup.RegexFinderTransformPath)
                    .loadFromFile(PathHeader.class, "data/PathHeader.txt");

I was wrong about the prefix. It just adds ^ to //, so you'd still need the //ldml/

Hope that helps.

@btangmu
Copy link
Member Author

btangmu commented Sep 19, 2025

PathHeader.java shows the right incantation (and PathHeader.txt shows it in action)...

That involves RegexLookup<RawData> instead of RegexLookup<Pair<String, String>>, where RawData looks very path-header-specific. I suspect that's not the relevant part of the incantation.

I suspect setPatternTransform is the relevant part, and that's what I'll study next. There's a risk of me spending too many hours just to simplify some of the regex notation a bit. I'll try not to go too much further down the rabbit hole.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants