Non-ascii characters corrupt the range data 

(Originally reported at [solidity-coverage 418][1])

("corrupt" might be overdramatizing this a little.)

It looks like ranges are calculated by character count rather than string length, and non-ascii characters are 'wider' than length 1. This can introduce unexpected drift if you're using the parser to identify string injection points when modifying source files.

**Ascii: length 36**
```solidity
contract A {
    /// S
    uint x;
}
```

**Non Ascii: length 37**
```solidity
contract A {
    /// 𝕊
    uint x;
}
```

These two contracts produce the same range data.  Not sure this can (or should?) be fixed here. A simple work-around for my case is to sanitize files before parsing. 

The issue raising this at SC involved [scientific notation in a natspec comment][2].

[1]: https://github.com/sc-forks/solidity-coverage/issues/418
[2]: https://github.com/omisego/plasma-contracts/commit/c7ff3eb8d469ab010b39c121a5b218275d44e86f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-ascii characters corrupt the range data #90

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Non-ascii characters corrupt the range data #90

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions