Skip to content

Commit 1fe43f5

Browse files
authored
Remove ruby compat hacks (#259)
* Fix two minor bugs from the Ruby code First, `categroy` rather than `code` was used in constructing the `control_boundary` property as related to the characters U+200C and U+200D. This seemed incorrect and should be fixed. This could be an observable bugfix for any C code which inspects the `control_boundary` property. Second, when reading composition exclusions, Ruby's String hex method produces zero rather than nil if no number is found. For example $ ruby -e 'puts "# blah".hex' 0 This led to the character `'\0'` being included in the `exclusions` and `excl_versions` sets which is incorrect. However this seems asymptomatic because `'\0'` is never part of a composition. (In terms of the C code, the use of `comp_exclusion` is guarded by the `comb_index` property which is `UINT16_MAX` for `'\0'`.) * Cleanup: Remove sequence ordering hack This hack changed the ordering of sequences encoded in the sequences table and was added so we could easily prove equivalence to the Ruby data generator code. However, it's no longer needed and removing it shouldn't result in any functional change.
1 parent a78bee9 commit 1fe43f5

File tree

2 files changed

+13800
-13812
lines changed

2 files changed

+13800
-13812
lines changed

data/data_generator.jl

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -95,10 +95,6 @@ end
9595
exclusions = Set(read_composition_exclusions(r"# \(1\) Script Specifics.*?# Total code points:"s))
9696
excl_version = Set(read_composition_exclusions(r"# \(2\) Post Composition Version precomposed characters.*?# Total code points:"s))
9797

98-
# FIXME: Replicate a bug in the ruby code
99-
push!(exclusions, 0)
100-
push!(excl_version, 0)
101-
10298
#-------------------------------------------------------------------------------
10399
function read_case_folding(filename)
104100
case_folding = Dict{UInt32,Vector{UInt32}}()
@@ -396,8 +392,7 @@ function char_table_properties!(sequences, char)
396392
comp_exclusion = code in exclusions || code in excl_version,
397393
ignorable = code in ignorable,
398394
control_boundary = char.category in ("Zl", "Zp", "Cc", "Cf") &&
399-
# FIXME: Ruby bug compat - should be `code in (0x200C, 0x200D)`
400-
!(char.category in (0x200C, 0x200D)),
395+
!(char.code in (0x200C, 0x200D)),
401396
charwidth = derive_char_width(code, char.category),
402397
boundclass = get_grapheme_boundclass(code),
403398
indic_conjunct_break = get_indic_conjunct_break(code),
@@ -407,13 +402,6 @@ end
407402
# Many character properties are duplicates. Deduplicate them, constructing a
408403
# per-character array of indicies into the properties array
409404
sequences = UTF16Sequences()
410-
411-
# FIXME: Hack to force ordering compat with Ruby code
412-
for c in char_props
413-
encode_sequence!(sequences, c.decomp_mapping)
414-
encode_sequence!(sequences, get_case_folding(c.code))
415-
end
416-
417405
char_table_props = [char_table_properties!(sequences, cp) for cp in char_props]
418406

419407
deduplicated_props = Origin(0)(Vector{eltype(char_table_props)}())

0 commit comments

Comments
 (0)