Skip to content

Needs “bytes” string mode. #24

@FGasper

Description

@FGasper
> perl -MDevel::Peek -MJSON -e'my $str = "é"; my $str2 = JSON::decode_json( JSON::encode_json([$str]) )->[0]; Dump $str; Dump $str2'
SV = PV(0xd3cd60) at 0xd58620
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0xd5f3b0 "\303\251"\0
  CUR = 2
  LEN = 10
  COW_REFCNT = 1
SV = PV(0xd3ceb0) at 0xd58698
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0xd4b0f0 "\303\203\302\251"\0 [UTF8 "\x{c3}\x{a9}"]
  CUR = 4
  LEN = 10

In default mode, CDB_File will print $str and $str2 differently: 2 bytes and 4 bytes, respectively.

The utf8 => 1 mode allows for one fix to that by printing both strings as 4 bytes. Effectively this mode says, “my strings are characters; please store their UTF-8 representation.”

Ideally there should be another mode that informs the encoder that the strings are bytes. It would do the opposite of utf8 => 1, i.e., store $str’s internal PV as-is, but store sv_2pvbyte() on $str2. This has the desirable effect of croaking if any UTF8-flagged SVPV contains a code point that exceeds 255 (and thus cannot be a byte string).

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions