Skip to content

Unescaped final Soft characters in LinkFormattingTest.txt #527#1417

Merged
markusicu merged 8 commits into
mainfrom
Unescaped-final-Soft-characters-in-LinkFormattingTest.txt-#527
May 20, 2026
Merged

Unescaped final Soft characters in LinkFormattingTest.txt #527#1417
markusicu merged 8 commits into
mainfrom
Unescaped-final-Soft-characters-in-LinkFormattingTest.txt-#527

Conversation

@macchiati
Copy link
Copy Markdown
Member

@macchiati macchiati commented May 19, 2026

There was a bad parameter passed in when processing the wiki URLs for use in the test data file. This exposed a bug in how that parameter was used. This corrects that, and gives the parameters more appropriate names (using an enum instead of a naked boolean).

Fixes https://github.com/unicode-org/properties/issues/527

NOTE: I could also regenerate the v17 file if needed. It has ~10 fixes also.

  • Approver: Feel free to merge on my behalf
    • rebase & merge one or more commits
    • squash & merge multiple commits into one

@macchiati macchiati requested a review from markusicu May 19, 2026 16:57
Comment thread unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
@markusicu
Copy link
Copy Markdown
Member

NOTE: I could also regenerate the v17 file if needed. It has ~10 fixes also.

I think we should treat published files as immutable, as we normally do.

Comment thread unicodetools/data/linkification/dev/LinkFormattingTest.txt
Comment thread unicodetools/data/linkification/dev/LinkFormattingTest.txt Outdated
Comment thread unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java Outdated
Comment thread unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
@macchiati
Copy link
Copy Markdown
Member Author

NOTE: I could also regenerate the v17 file if needed. It has ~10 fixes also.

I think we should treat published files as immutable, as we normally do.

Of course. The question is whether we should make a 17.1 file available, or document the problem so that people don't stumble over it until they implement 18.

Copy link
Copy Markdown
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output lgtm now.

looks like spotless is unhappy, and maybe some other CI check :-/

Comment thread unicodetools/src/main/java/org/unicode/tools/GenerateLinkData.java Outdated
@macchiati
Copy link
Copy Markdown
Member Author

I think this should do it; I am also escaping any final Soft characters in the full-escaped string.

Copy link
Copy Markdown
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm tnx

@markusicu markusicu merged commit fb660c4 into main May 20, 2026
21 checks passed
@markusicu markusicu deleted the Unescaped-final-Soft-characters-in-LinkFormattingTest.txt-#527 branch May 20, 2026 15:50
@macchiati
Copy link
Copy Markdown
Member Author

BTW, if we want to be more explicit, we could make the following change in the header of the test file.

OLD

# The fully-escaped field percent-escapes all literal syntax characters and all characters above ASCII.

NEW

# The fully-escaped field percent-escapes all code points based on https://url.spec.whatwg.org/#percent-encoded-bytes.
# This means all literal syntax characters in each Part and all code points above ASCII.
# It also percent-escapes the last character, if it is Link_Term:Soft.

@markusicu
Copy link
Copy Markdown
Member

BTW, if we want to be more explicit, we could make the following change in the header of the test file.

good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants