Skip to content

Commit c29572d

Browse files
committed
Fixes from Markus's review
1 parent 5c28004 commit c29572d

File tree

5 files changed

+26
-19
lines changed

5 files changed

+26
-19
lines changed

unicodetools/data/links/dev/LinkPairedOpener.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LinkPairedOpener.txt
2-
# Date: 2025-01-09
3-
# © 2024-2025 Unicode®, Inc.
2+
# Date: 2025-06-27
3+
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
@@ -12,10 +12,10 @@
1212
# else if cp == ">" then Link_Paired_Opener(cp) = "<"
1313
# else Link_Paired_Opener(cp) = \x{0}
1414

15-
# Property: Link_Paired_Opener
16-
1715
# @missing: 0000..10FFFF; <none>
1816

17+
# Property: Link_Paired_Opener
18+
1919
0029; 0028 # “)” RIGHT PARENTHESIS 🡆 “(” LEFT PARENTHESIS
2020
003E; 003C # “>” GREATER-THAN SIGN 🡆 “<” LESS-THAN SIGN
2121
005D; 005B # “]” RIGHT SQUARE BRACKET 🡆 “[” LEFT SQUARE BRACKET

unicodetools/data/links/dev/LinkTermination.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# LinkTermination.txt
2-
# Date: 2025-01-09
3-
# © 2024-2025 Unicode®, Inc.
2+
# Date: 2025-06-27
3+
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# For documentation and usage, see https://www.unicode.org/reports/tr58/
88
#
99

10-
# Property: Link_Termination=Include
11-
1210
@missing: 0000..10FFFF; Include
1311

12+
# Property: Link_Termination=Include
13+
1414
# Link_Termination=Hard
1515
# derived from [\p{whitespace}\p{NChar}[\p{C}-\p{Cf}]\p{deprecated}]
1616

unicodetools/data/links/dev/LinkificationTest.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LinkificationTest.txt
2-
# Date: 2025-01-09
3-
# © 2024-2025 Unicode®, Inc.
2+
# Date: 2025-06-27
3+
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#

unicodetools/data/links/dev/ReadMe.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# See https://www.unicode.org/reports/tr58/ for documentation.
77

88
This directory contains data files for version 17.0.0
9-
of UTS #48 Link Detection and Serialization.
9+
of UTS #58 Link Detection and Serialization.

unicodetools/data/links/dev/SerializationTest.txt

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,26 @@
11
# SerializationTest.txt
2-
# Date: 2025-01-09
3-
# © 2024-2025 Unicode®, Inc.
2+
# Date: 2025-06-27
3+
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# For documentation and usage, see https://www.unicode.org/reports/tr58/
88
#
9-
# Field 0: Path
10-
# Field 1: Query
11-
# Field 2: Fragment
9+
# Field 0: Pre-Path portion of URL
10+
# Field 1: Path
11+
# Field 2: Query
12+
# Field 3: Fragment
1213
# Field 4: Expected result
1314
#
15+
# The input is 4 separate fields, 0..4. It represents the internal form of a URL, pre-escaping.
16+
# The result is an escaped string for output.
17+
#
18+
# The reason the input is represented as 4 different fields is that the algorithm applies different escaping
19+
# to each piece.
20+
#
1421
# Notes:
15-
# - The # character only begins a comment if it is the first character on a line.
16-
# - Leading and trailing spaces in a field are to be omitted, but interior spaces retained
22+
# - The # character is represented by \x{23} when it is part of a field (instead of introducing a comment in the data file)
23+
# - Leading and trailing spaces in a field are to be omitted, but interior spaces are retained
1724

1825
# Path only
1926
https://example.com; α; ; ; https://example.com/α
@@ -31,7 +38,7 @@ https://example.com; αβγ/δεζ; θ=ικλ&μ=γξο; πρς; https://examp
3138
https://example.com; α?μπ; ; ; https://example.com/α%3Fμπ
3239

3340
# Escape # in Path/Query
34-
https://example.com; α#β; γ=δ#ε; ; https://example.com/α%23β?γ=δ%23ε
41+
https://example.com; α\x{23}β; γ=δ\x{23}ε; ; https://example.com/α%23β?γ=δ%23ε
3542

3643
# Escape hard (' ')
3744
https://example.com; αβ γ/δεζ; θ=ικ λ&=γξο; πρ σ; https://example.com/αβ%20γ/δεζ?θ=ικ%20λ&=γξο#πρ%20σ

0 commit comments

Comments
 (0)