Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
185a38f
ICU-22851 Test the error paths in UnicodeSet parsing
eggrobin Aug 11, 2025
6a650e7
Call it a day
eggrobin Aug 11, 2025
85b8b50
Some progress, toPattern is wrong, but what is right?
eggrobin Aug 13, 2025
07ab1c1
ICU-22851 Test the exact behaviour of UnicodeSet::toPattern
eggrobin Aug 13, 2025
fd6940c
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 13, 2025
b489fa0
call it a day
eggrobin Aug 13, 2025
e147b15
Pattern-rebuilding logic
eggrobin Aug 14, 2025
f47cac4
More tests of toPattern
eggrobin Aug 14, 2025
e3efa59
ICU-22851 Test the exact behaviour of UnicodeSet::toPattern
eggrobin Aug 13, 2025
1e34418
Merge commit 'f47cac4' into recursive-descent
eggrobin Aug 14, 2025
6276769
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 14, 2025
a7a4035
Print strings
eggrobin Aug 14, 2025
cef298e
Appease the warnings even though these are string_views
eggrobin Aug 14, 2025
8f8dcca
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 14, 2025
b4e365b
ICU-22851 Test the exact behaviour of UnicodeSet::toPattern
eggrobin Aug 13, 2025
51e2702
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 14, 2025
cef2093
ICU-22851 Test various edge cases with $ in the absence of variables
eggrobin Aug 14, 2025
33b1075
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 14, 2025
9e126dd
$ handling
eggrobin Aug 14, 2025
c8d2b9e
comment
eggrobin Aug 14, 2025
bbcc231
ICU-22851 Even more $ edge cases
eggrobin Aug 14, 2025
ec299be
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 14, 2025
876d338
ICU-22851 Test various edge cases with $ in the absence of variables
eggrobin Aug 14, 2025
5d19376
Merge branch 'unicodeset-parser' into recursive-descent
eggrobin Aug 14, 2025
a6d9182
ICU-22851 Test UnicodeSet with lookupMatcher
eggrobin Aug 14, 2025
27728c7
Merge branch 'unicodeset-lookup-matcher' into recursive-descent-into-…
eggrobin Aug 14, 2025
e81735c
Something that works in the same silly way as it used to.
eggrobin Aug 15, 2025
4beef14
indentation on the parse error tests
eggrobin Aug 18, 2025
d0bc4fa
Merge branch 'unicodeset-parser' into recursive-descent-into-madness
eggrobin Aug 18, 2025
18f2b7b
ICU-22851 Test the error paths in UnicodeSet parsing
eggrobin Aug 11, 2025
03f792b
ICU-22851 Test the exact behaviour of UnicodeSet::toPattern
eggrobin Aug 13, 2025
8cc53b9
ICU-22851 Test various edge cases with $ in the absence of variables
eggrobin Aug 14, 2025
e89bbd2
Merge branch 'unicodeset-parser' into recursive-descent-into-madness
eggrobin Aug 18, 2025
65fe08e
dedent the pattern output test
eggrobin Aug 18, 2025
ed6dde4
Merge branch 'unicodeset-parser' into recursive-descent-into-madness
eggrobin Aug 18, 2025
b478400
ICU-22851 Test the exact behaviour of UnicodeSet::toPattern
eggrobin Aug 13, 2025
ae81d41
ICU-22851 Test various edge cases with $ in the absence of variables
eggrobin Aug 14, 2025
52d0d47
Merge branch 'unicodeset-parser' into recursive-descent-into-madness
eggrobin Aug 18, 2025
8eec971
ICU-23179 Test the error paths in UnicodeSet parsing
eggrobin Aug 11, 2025
dabce0b
ICU-23179 Test the exact behaviour of UnicodeSet::toPattern
eggrobin Aug 13, 2025
6bd0425
ICU-23179 Test various edge cases with $ in the absence of variables
eggrobin Aug 14, 2025
3d9f84f
Merge branch 'unicodeset-parser' into recursive-descent-into-madness
eggrobin Aug 18, 2025
d6fc731
meow
eggrobin Aug 18, 2025
38bfae0
Merge branch 'unicodeset-lookup-matcher' into recursive-descent-into-…
eggrobin Aug 18, 2025
83bf69b
ICU-23179 Test UnicodeSet with lookupMatcher
eggrobin Aug 18, 2025
f6d865c
Merge branch 'unicodeset-lookup-matcher' into recursive-descent-into-…
eggrobin Aug 18, 2025
ed395a6
meow
eggrobin Aug 18, 2025
93b31e7
Merge branch 'unicodeset-lookup-matcher' into recursive-descent-into-…
eggrobin Aug 18, 2025
3a4ab45
ICU-23179 Test UnicodeSet with lookupMatcher
eggrobin Aug 18, 2025
15e620d
Merge branch 'unicodeset-lookup-matcher' into recursive-descent-into-…
eggrobin Aug 18, 2025
d5e73a8
ICU-23179 Test the exact sequence of lookups
eggrobin Aug 18, 2025
c620b47
Merge branch 'unicodeset-lookup-matcher' into recursive-descent-into-…
eggrobin Aug 18, 2025
ef59acb
ICU-23179 Test the exact sequence of lookups
eggrobin Aug 18, 2025
770c9aa
Ignore warnings
eggrobin Aug 20, 2025
a36a0b6
Merge branch 'unicodeset-lookup-matcher' into HEAD
eggrobin Aug 20, 2025
1da1ffc
Merge remote-tracking branch 'la-vache/main' into HEAD
eggrobin Aug 20, 2025
218cec4
Merge commit '1da1ffc' into recursive-descent-into-madness
eggrobin Aug 20, 2025
110a54d
Abstract away the getPos/next/setPos/lookupMatcher dance
eggrobin Aug 21, 2025
34bc05d
Drop some traces
eggrobin Aug 21, 2025
5c44163
ifdef out the remaining traces
eggrobin Aug 21, 2025
da4b123
Remove the old code
eggrobin Aug 21, 2025
ff092dc
Unused variables
eggrobin Aug 21, 2025
f0bd37b
Some work towards a proper lexer
eggrobin Aug 26, 2025
b78c0ce
A proper lexer
eggrobin Aug 27, 2025
d61b090
Don’t report end of text as a literal-element
eggrobin Aug 27, 2025
40460d9
Turn off traces
eggrobin Aug 27, 2025
e39c4d1
ICU-23179 Test more edge cases when mapping syntax characters to sets
eggrobin Aug 27, 2025
9014b75
Merge branch 'doctor-it-hurts-when-i-do-this' into dura-lex-sed-lex
eggrobin Aug 27, 2025
93d9296
Deal with the ambiguous - and ^
eggrobin Aug 27, 2025
7940892
Update sequence expectations
eggrobin Aug 27, 2025
d3cc9ea
warnings
eggrobin Sep 2, 2025
3cfc4ae
Clarify some comments
eggrobin Sep 2, 2025
629bc89
more discursive comments
eggrobin Sep 3, 2025
cace9d7
make it compile
eggrobin Sep 3, 2025
0323593
libstdc++ dependencies
eggrobin Sep 8, 2025
bcb7ac0
quote?
eggrobin Sep 8, 2025
66cceeb
No infinite loops in the lexer
eggrobin Sep 9, 2025
f79b35c
That is well-formed
eggrobin Sep 9, 2025
5f40223
Merge remote-tracking branch 'la-vache/main' into recursive-descent-i…
eggrobin Sep 11, 2025
94cc56c
dedent
eggrobin Sep 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 51 additions & 9 deletions icu4c/source/common/unicode/uniset.h
Original file line number Diff line number Diff line change
Expand Up @@ -1696,13 +1696,58 @@ class U_COMMON_API UnicodeSet final : public UnicodeFilter {
const SymbolTable* symbols,
UErrorCode& status);

void applyPattern(RuleCharacterIterator& chars,
const SymbolTable* symbols,
UnicodeString& rebuiltPat,
void applyPattern(const UnicodeString &pattern,
const ParsePosition& parsePosition,
RuleCharacterIterator &chars,
const SymbolTable *symbols,
UnicodeString &rebuiltPat,
uint32_t options,
UnicodeSet& (UnicodeSet::*caseClosure)(int32_t attribute),
int32_t depth,
UErrorCode& ec);
UnicodeSet &(UnicodeSet::*caseClosure)(int32_t attribute),
UErrorCode &ec);

// Recursive-descent predictive parsing. These functions parse the syntactic categories
// matching their name in the base grammar of PD UTR #56 (before the highlighted changes are
// applied).
// See https://www.unicode.org/reports/tr61/tr61-1.html#Set-Operations.
// `parseUnicodeSet` clears `*this` and makes it represent the parsed UnicodeSet; all other functions
// add the set represented by the parsed construct to `*this`.

class Lexer;

void parseUnicodeSet(Lexer &lexer,
UnicodeString &rebuiltPat,
uint32_t options,
UnicodeSet &(UnicodeSet::*caseClosure)(int32_t attribute),
int32_t depth,
UErrorCode &ec);

void parseUnion(Lexer &lexer,
UnicodeString &rebuiltPat,
uint32_t options,
UnicodeSet &(UnicodeSet::*caseClosure)(int32_t attribute),
int32_t depth,
bool &containsRestrictions,
UErrorCode &ec);

void parseTerm(Lexer &lexer,
UnicodeString &rebuiltPat,
uint32_t options,
UnicodeSet &(UnicodeSet::*caseClosure)(int32_t attribute),
int32_t depth,
bool &containsRestrictions,
UErrorCode &ec);

void parseRestriction(Lexer &lexer,
UnicodeString &rebuiltPat,
uint32_t options,
UnicodeSet &(UnicodeSet::*caseClosure)(int32_t attribute),
int32_t depth,
UErrorCode &ec);

void parseElements(Lexer &lexer,
UnicodeString &rebuiltPat,
UErrorCode &ec);


void closeOverCaseInsensitive(bool simple);
void closeOverAddCaseMappings();
Expand Down Expand Up @@ -1754,9 +1799,6 @@ class U_COMMON_API UnicodeSet final : public UnicodeFilter {
static UBool resemblesPropertyPattern(const UnicodeString& pattern,
int32_t pos);

static UBool resemblesPropertyPattern(RuleCharacterIterator& chars,
int32_t iterOpts);

/**
* Parse the given property pattern at the given parse position
* and set this UnicodeSet to the result.
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/common/uniset_closure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ UnicodeSet& UnicodeSet::applyPattern(const UnicodeString& pattern,
// _applyPattern calls add() etc., which set pat to empty.
UnicodeString rebuiltPat;
RuleCharacterIterator chars(pattern, symbols, pos);
applyPattern(chars, symbols, rebuiltPat, options, &UnicodeSet::closeOver, 0, status);
applyPattern(pattern, pos, chars, symbols, rebuiltPat, options, &UnicodeSet::closeOver, status);
if (U_FAILURE(status)) return *this;
if (chars.inVariable()) {
// syntaxError(chars, "Extra chars in variable value");
Expand Down
Loading