-
Notifications
You must be signed in to change notification settings - Fork 156
Closed
Description
Hello,
Thanks for all your work on utf8proc. It’s much appreciated.
While rewriting data_generator.rb & charwidths.jl (v2.6.1) in Python I spotted a couple of potential issues.
- The following lines in data_generator.rb produce spurious 0s which are added to $exclusions and $excl_version. (This occurs because there are comment lines in the input.)
134: $exclusions = $exclusions.chomp.split("\n").collect { |e| e.hex }
...
137: $excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex }
This results in utf8proc_property_struct.comp_exclusion = true
for U+0000. Without the spurious 0s it is false
.
- The following line in data_generator.rb looks wrong:
250: "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(category)}, " <<
^^^^^^^^
should (probably) be:
250: "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(code)}, " <<
^^^^
This results in utf8proc_property_struct.control_boundary = true
for U+200C and U+200D. With the change it is false
.
Can anyone definitively state if these property changes are correct?
I hope this info is helpful.
If there’s interest I will open a separate issue for my new Python data generator.
Regards,
CHRIS
Metadata
Metadata
Assignees
Labels
No labels