Skip to content

Comments

Regenerate linkification data in MakeUnicodeFiles#1286

Merged
eggrobin merged 6 commits intounicode-org:mainfrom
eggrobin:linkification-in-MakeUnicodeFiles
Feb 3, 2026
Merged

Regenerate linkification data in MakeUnicodeFiles#1286
eggrobin merged 6 commits intounicode-org:mainfrom
eggrobin:linkification-in-MakeUnicodeFiles

Conversation

@eggrobin
Copy link
Member

@eggrobin eggrobin commented Feb 3, 2026

And only regenerate it if it changes.

@eggrobin eggrobin requested a review from markusicu February 3, 2026 13:38
@markusicu
Copy link
Member

spotless looks unhappy

writePropHeader(out.tempPrintWriter, HEADER_PROP_TERM, "LinkTerm", "Link_Term", "Hard");
for (LinkTermination propValue : LinkTermination.NON_MISSING) {
bf.showSetNames(out, propValue.base);
bf.showSetNames(out.tempPrintWriter, propValue.base);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we use out.tempPrintWriter vs. just out?

# Date: 2026-01-31, 12:27:25 GMT
# © 2026 Unicode®, Inc.
# Date: 2026-02-03, 13:35:49 GMT
# © 2025 Unicode®, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely write 2026 now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we should make this change consistently and globally by updating MakeUnicodeFiles.txt. This will rewrite the whole UCD. Let’s do that in another PR.


public static void main(String[] args) throws IOException {
generatePropertyData();
System.out.println("TLDs=\t" + Joiner.on(' ').join(LinkUtilities.TLDS));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is LinkUtilities.TLDS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t the faintest idea, this print statement was at the beginning of generatePropertyData and wasn’t part of the generation of any of the three files, so I lifted it here to preserve the behaviour of this tool.

I guess it has something to do with top level domains ?

@markusicu
Copy link
Member

now you have a merge conflict...

markusicu
markusicu previously approved these changes Feb 3, 2026
String filename,
String testName,
String copyrightYear) {
out.println(simpleFormatter.format(filename, dt.format(now), copyrightYear, testName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious / probably for later: are dt.format(now) and copyrightYear used for the same output?

Copy link
Member Author

@eggrobin eggrobin Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dt.format(now) is for the Date: line; that one is actually the time of regeneration, but ignored for diffing.

copyrightYear is not ignored for diffing, so when using MakeUnicodeFiles it is from MakeUnicodeFiles.txt so that we don’t break on the 1st of January.

(Of course the emoji do break on the 1st of January, see #1273. I should fix that, timebombs in CI are annoying.)

@eggrobin eggrobin merged commit 51ecc75 into unicode-org:main Feb 3, 2026
15 of 16 checks passed
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the cleanup and integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants