ICU-11443 WIP: Link detection according to UTS#58.#3878
ICU-11443 WIP: Link detection according to UTS#58.#3878arnt wants to merge 1 commit intounicode-org:mainfrom
Conversation
| @@ -0,0 +1,71 @@ | |||
| // © 2025 and later: Unicode, Inc. and others. | |||
There was a problem hiding this comment.
Most Unicode properties are supported more directly in ICU, so that additional files and parsing code are not necessary. Need to check with @markusicu as to whether the UTS58 properties are or will be.
There was a problem hiding this comment.
Right. If they are, I assume most or all of this can be dropped.
| @@ -0,0 +1,210 @@ | |||
| // © 2025 and later: Unicode, Inc. and others. | |||
There was a problem hiding this comment.
This might be better architected as a plain text file matching the table on https://www.iana.org/domains/root/db, and then changing this to just construct a HashSet statically from that file. (and maybe no .py file). That way it is a simple drop-in.
There was a problem hiding this comment.
Yes. I optimised for runtime performance. There's much to say for a simple drop-in.
| @@ -0,0 +1,444 @@ | |||
| // © 2016 and later: Unicode, Inc. and others. | |||
There was a problem hiding this comment.
Tests like these are better structure by having a plain text file that can be deployed across different implementations including different programming languages.
The file can have a simple structure, a series of inputs and expected outputs. Either as a semicolon delimited file, or something with a simple format like JSON
|
BTW, just taking a quick look; will need to dig into the guts of the code later. |
This is a UTS58 link detector in Java. I haven't done the C++ side yet.
The new code is in .../LinkDetector.java and supporting files.
I also haven't done the PR submission chores or looked for a JIRA issue. But it's Friday, 17:25, I need to be somewhere at 18:30, and I want to go out feeling the joy of closure.
My rough plan is to look for a JIRA ticket and do other chores on Tuesday, update the Java implementation according to comments, and when the Java implementation looks good to merge I'll write a corresponding ICU4C implementation. At that point I'll also extend the Ruby implementation on which this is based.
For now this is just to give you a look at the code.