Skip to content

character-sets [rt.cpan.org #80306] #3

@toddr

Description

@toddr

Migrated from rt.cpan.org#80306 (status was 'new')

Requestors:

From markov@cpan.org on 2012-10-20 22:42:12
:

Hi,

Perl is very good in handling character-sets... if you follow one very
simple rule: on any place where characters enter of leave the program
you have to be very explicit.  IMO, digesting strings is a kind of
output filter.

Digest ignores the character-set problem.  For instance, if I read text
from a file which contains äø as valid latin1 and I digest that, I will
get a different digest from the same characters as autf8.  The problem is
that my latin1 string might automatically be converted into utf8!  The
programmer does not always know whether Perl converts the input data.

The solution would be to add an explicit character-set on new()

  Digest->new('SHA-1', encoding => 'utf8')

to specify which charset the text must be in to be signed.  When
specified, it should call Encode::encode.  When not specified, it
should croak when the utf-8 flag is on: it should be interpreted as
raw bytes.

The work-around is

  $digest->add(encode 'utf8', $text)

for every call to add.  The "utf8" information is wrongly located,
because add() is not about output.

I hope you will consider this improvement.


Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions