Conversation
This refactoring will allow for an easier addition/deletion of allowed characters, without directly manipulating the regular expression.
|
This refactoring could facilitate adding more characters to the allowed set. @flooose what do you think? |
1 similar comment
|
I seems like it would be fine, but I'd be interested what happens in terms of performance when we start adding more characters to allowed_chars.txt. My understanding of the xls file was that all of the characters marked in green should be allowed. That would be a large regex :) Any idea of how to test this? Have you tried adding just the Greek alphabet to see if there are any performance hits? |
|
I ran the following code: require 'benchmark'
STRING_LENGTH = 100_000
regex_filter = /[^a-zA-Z0-9ÄÖÜäöüß&*$%\ \'\:\?\,\-\(\+\.\)\/]/
string_filter = "^#{Regexp.escape("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ÄÖÜäöüß&*$% ':?,-(+.)/")}"
regex_filter_with_greek = /[^a-zA-ZΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρστυφχψωάόίήώύΆΉΏΎ0-9ÄÖÜäöüß&*$%\ \'\:\?\,\-\(\+\.\)\/]/
string_filter_with_greek = "^#{Regexp.escape("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρστυφχψωάόίήώύΆΉΏΎ0123456789ÄÖÜäöüß&*$% ':?,-(+.)/")}"
charset = Array('A'..'Z') + Array('a'..'z') + Array(0..9) + %w[ÄÖÜäöüß&*$%] + %w[!^@~\\] + %w[ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρστυφχψωάόίήώύΆΉΏΎ]
my_string = Array.new(STRING_LENGTH) { charset.sample }.join
results = Benchmark.bm do |x|
x.report('gsub with greek:') { my_string.gsub(regex_filter_with_greek, '') }
x.report('gsub no greek:') { my_string.gsub(regex_filter, '') }
x.report('tr with greek:') { my_string.tr(string_filter_with_greek, '') }
x.report('tr no greek:') { my_string.tr(string_filter, '') }
endFor a
For a
What I find somewhat surprising, is the fact that |
|
Interesting. I guess, let's see what the maintainers say. |
|
Ping. Maybe no one is responding because the pull request is still a "draft"? |
|
Closed in favour of #105. |
Resolves #92.