-
-
Notifications
You must be signed in to change notification settings - Fork 851
ICU-23301 Grammatical UnicodeSet variables #3825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Still "draft". Not yet ready to look at? |
|
Oops, undrafted. |
| "[[a-z]-[c-z]-]", nullptr, | ||
| "string", "{", "end", "}", "[ $string Zeichenkette $end ]", "[{Zeichenkette}]", nullptr, | ||
| // Variables do not expand inside string literals. | ||
| "us", "[a-z]", "[$us{$us}]", R"([a-z{\$us}])", nullptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the $ in the string literal need to be escaped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input is [$us{$us}], no escaping here.
The [a-z{\$us}] is the output of UnicodeSet pattern rewriting; that always escapes dollar signs (I think it applies the same logic inside and outside strings).
Maybe it shouldn’t do that inside strings anymore, but this like a separate issue; [a-z{\$us}] is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that - also gets escaped in toPattern strings, even though it has never meant anything inside a string literal:
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%7BBaden-W%C3%BCrttemberg%7D%5D&g=&i=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. Please add it to your pile of stuff to revisit later.
markusicu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks... plausible
| if (UnicodeString name = symbols_->parseReference(pattern_, nameEnd, pattern_.length()); | ||
| !name.isEmpty()) { | ||
| chars_.jumpahead(nameEnd.getIndex() - (start + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-to-read indentation style... but if I don't find anything else, let's go with it.
9109c05 to
1655f87
Compare
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
1655f87 to
908e5a2
Compare
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Require that UnicodeSet variables to correspond to a set, a single element, or a stand-in for a single pre-parsed set (via lookupMatcher, to be removed later as part of ICU-23297). Variables in string literals are no longer expanded:
{$this}now means{\$this}rather than{Bobby Tables}]&[]&[{}.Remove applyPropertyPattern in favour of the new lexer as discussed in #3604 (comment).
No other changes in behaviour: named-element is still a set, spaces are still ignored in string-literal, spaces are still allowed in
[: ^(the change for that first one will come in a subsequent pull request, the others will be proposed shortly).Checklist