Skip to content
This repository was archived by the owner on Sep 27, 2020. It is now read-only.

Non-ascii characters corrupt the range data Β #90

@cgewecke

Description

@cgewecke

(Originally reported at solidity-coverage 418)

("corrupt" might be overdramatizing this a little.)

It looks like ranges are calculated by character count rather than string length, and non-ascii characters are 'wider' than length 1. This can introduce unexpected drift if you're using the parser to identify string injection points when modifying source files.

Ascii: length 36

contract A {
    /// S
    uint x;
}

Non Ascii: length 37

contract A {
    /// π•Š
    uint x;
}

These two contracts produce the same range data. Not sure this can (or should?) be fixed here. A simple work-around for my case is to sanitize files before parsing.

The issue raising this at SC involved scientific notation in a natspec comment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions