Skip to content

Commit 539a623

Browse files
committed
Add v2.0.0: Table-driven O(log n) architecture
Performance improvements: - UAX #9: Binary search bidi class lookup (3,060 ranges) - UAX #29: Packed 16-bit data structure (4,673 ranges) - Character classification: ~60-100 ns/op, 0 allocations - All data generated from official Unicode 17.0.0 files New APIs: - uax29.FindAllBreaks() for single-pass break detection - Framework for future hierarchical optimization Maintains 100% Unicode conformance: - UAX #9: 513,494/513,494 tests - UAX #14: 19,338/19,338 tests - UAX #29: 3,222/3,222 tests - UTS #51: 5,223/5,223 tests
1 parent 7af44c3 commit 539a623

19 files changed

+11160
-446
lines changed

.github/workflows/ci.yml

Lines changed: 143 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ name: CI
22

33
on:
44
push:
5-
branches: [ main, develop ]
5+
branches: [ main ]
66
pull_request:
7-
branches: [ main, develop ]
7+
branches: [ main ]
88

99
jobs:
1010
test:
@@ -47,6 +47,141 @@ jobs:
4747
flags: unittests
4848
name: codecov-umbrella
4949

50+
fetch-conformance-tests:
51+
name: Fetch & Verify Official Conformance Tests
52+
runs-on: ubuntu-latest
53+
54+
steps:
55+
- name: Checkout code
56+
uses: actions/checkout@v4
57+
58+
- name: Set up Go
59+
uses: actions/setup-go@v5
60+
with:
61+
go-version: '1.25'
62+
63+
- name: Download official Unicode 17.0.0 test files
64+
run: |
65+
echo "Downloading official Unicode 17.0.0 conformance test files..."
66+
67+
# UAX #29 - Text Segmentation test files
68+
curl -fsSL -o uax29/GraphemeBreakTest.txt.new \
69+
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/GraphemeBreakTest.txt
70+
curl -fsSL -o uax29/WordBreakTest.txt.new \
71+
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/WordBreakTest.txt
72+
curl -fsSL -o uax29/SentenceBreakTest.txt.new \
73+
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/SentenceBreakTest.txt
74+
75+
# UAX #14 - Line Breaking test file
76+
curl -fsSL -o uax14/LineBreakTest.txt \
77+
https://www.unicode.org/Public/17.0.0/ucd/auxiliary/LineBreakTest.txt
78+
79+
echo "Download complete!"
80+
81+
- name: Verify downloaded test files
82+
run: |
83+
echo "Verifying test files..."
84+
85+
# Check UAX29 files exist and are valid
86+
for file in GraphemeBreakTest WordBreakTest SentenceBreakTest; do
87+
if [ ! -f "uax29/${file}.txt.new" ]; then
88+
echo "ERROR: uax29/${file}.txt.new not found"
89+
exit 1
90+
fi
91+
if [ ! -s "uax29/${file}.txt.new" ]; then
92+
echo "ERROR: uax29/${file}.txt.new is empty"
93+
exit 1
94+
fi
95+
# Check for Unicode 17.0.0 version marker
96+
if ! head -1 "uax29/${file}.txt.new" | grep -q "17.0.0"; then
97+
echo "ERROR: uax29/${file}.txt.new does not appear to be version 17.0.0"
98+
head -1 "uax29/${file}.txt.new"
99+
exit 1
100+
fi
101+
echo "✓ uax29/${file}.txt.new is valid (Unicode 17.0.0)"
102+
done
103+
104+
# Check UAX14 file
105+
if [ ! -f "uax14/LineBreakTest.txt" ]; then
106+
echo "ERROR: uax14/LineBreakTest.txt not found"
107+
exit 1
108+
fi
109+
if [ ! -s "uax14/LineBreakTest.txt" ]; then
110+
echo "ERROR: uax14/LineBreakTest.txt is empty"
111+
exit 1
112+
fi
113+
echo "✓ uax14/LineBreakTest.txt is valid"
114+
115+
echo "All test files verified successfully!"
116+
117+
- name: Run conformance tests with fresh test files
118+
run: |
119+
echo "Running UAX #29 conformance tests with freshly downloaded files..."
120+
121+
# Replace test files with newly downloaded ones
122+
mv uax29/GraphemeBreakTest.txt.new uax29/GraphemeBreakTest.txt
123+
mv uax29/WordBreakTest.txt.new uax29/WordBreakTest.txt
124+
mv uax29/SentenceBreakTest.txt.new uax29/SentenceBreakTest.txt
125+
126+
# Run UAX29 conformance tests
127+
cd uax29
128+
go test -v -run TestGraphemeBreakOfficial
129+
go test -v -run TestWordBreakOfficial
130+
go test -v -run TestSentenceBreakOfficial
131+
cd ..
132+
133+
# Run UAX14 conformance tests
134+
echo "Running UAX #14 conformance tests..."
135+
cd uax14
136+
go test -v -run TestOfficialUnicodeVectors
137+
cd ..
138+
139+
echo "All conformance tests passed with official Unicode 17.0.0 test files!"
140+
141+
conformance:
142+
name: Unicode Conformance Tests
143+
runs-on: ubuntu-latest
144+
145+
steps:
146+
- name: Checkout code
147+
uses: actions/checkout@v4
148+
149+
- name: Set up Go
150+
uses: actions/setup-go@v5
151+
with:
152+
go-version: '1.25'
153+
154+
- name: UAX #11 - East Asian Width Tests
155+
run: |
156+
cd uax11
157+
go test -v
158+
159+
- name: UAX #14 - Line Breaking Tests
160+
run: |
161+
cd uax14
162+
go test -v -run TestEdgeCases
163+
go test -v -run TestUnicodeControlCharacters
164+
go test -v -run TestAsianScripts
165+
166+
- name: UAX #29 - Text Segmentation Conformance Tests
167+
run: |
168+
cd uax29
169+
go test -v -run TestGraphemeBreakOfficial
170+
go test -v -run TestWordBreakOfficial
171+
go test -v -run TestSentenceBreakOfficial
172+
173+
- name: UAX #50 - Vertical Text Layout Tests
174+
run: |
175+
cd uax50
176+
go test -v
177+
178+
- name: UTS #51 - Unicode Emoji Conformance Tests
179+
run: |
180+
cd uts51
181+
go test -v -run TestEmojiTestFileConformance
182+
go test -v -run TestEmojiProperties
183+
go test -v -run TestEmojiSequences
184+
50185
lint:
51186
name: Lint
52187
runs-on: ubuntu-latest
@@ -80,4 +215,9 @@ jobs:
80215
go-version: '1.25'
81216

82217
- name: Build all packages
83-
run: go build ./...
218+
run: |
219+
go build ./uax11
220+
go build ./uax14
221+
go build ./uax29
222+
go build ./uax50
223+
go build ./uts51

.gitignore

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,2 @@
1-
# Unicode test data files
21
*.txt
3-
4-
# Test binaries
52
test_fsm
6-
7-
# IDE
8-
.vscode/
9-
.idea/
10-
*.swp
11-
*.swo
12-
*~
13-
14-
# OS
15-
.DS_Store
16-
Thumbs.db

0 commit comments

Comments
 (0)