Skip to content

A couple of potential issues in data_generator.rb #226

@chris0e3

Description

@chris0e3

Hello,

Thanks for all your work on utf8proc. It’s much appreciated.

While rewriting data_generator.rb & charwidths.jl (v2.6.1) in Python I spotted a couple of potential issues.

  1. The following lines in data_generator.rb produce spurious 0s which are added to $exclusions and $excl_version. (This occurs because there are comment lines in the input.)
    134: $exclusions = $exclusions.chomp.split("\n").collect { |e| e.hex }
    ...
    137: $excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex }

This results in utf8proc_property_struct.comp_exclusion = true for U+0000. Without the spurious 0s it is false.

  1. The following line in data_generator.rb looks wrong:
    250:    "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(category)}, " <<
                                                                                    ^^^^^^^^

should (probably) be:

    250:    "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(code)}, " <<
                                                                                    ^^^^

This results in utf8proc_property_struct.control_boundary = true for U+200C and U+200D. With the change it is false.

Can anyone definitively state if these property changes are correct?
I hope this info is helpful.

If there’s interest I will open a separate issue for my new Python data generator.

Regards,

CHRIS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions