A couple of potential issues in data_generator.rb

Hello,

Thanks for all your work on utf8proc.  It’s much appreciated.

While rewriting data_generator.rb & charwidths.jl (v2.6.1) in Python I spotted a couple of potential issues.

1. The following lines in data_generator.rb produce spurious 0s which are added to $exclusions and $excl_version.  (This occurs because there are comment lines in the input.)
```
    134: $exclusions = $exclusions.chomp.split("\n").collect { |e| e.hex }
    ...
    137: $excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex }
```
This results in `utf8proc_property_struct.comp_exclusion = true` for U+0000.  Without the spurious 0s it is `false`.

2. The following line in data_generator.rb looks wrong:
```
    250:    "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(category)}, " <<
                                                                                    ^^^^^^^^
```
should (probably) be:
```
    250:    "#{%W[Zl Zp Cc Cf].include?(category) and not [0x200C, 0x200D].include?(code)}, " <<
                                                                                    ^^^^
```
This results in `utf8proc_property_struct.control_boundary = true` for U+200C and U+200D.  With the change it is `false`.

Can anyone definitively state if these property changes are correct?
I hope this info is helpful.

If there’s interest I will open a separate issue for my new Python data generator.

Regards,

CHRIS


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A couple of potential issues in data_generator.rb #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A couple of potential issues in data_generator.rb #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions