Skip to content

char_range() function for indexing into source strings #457

@davidanthoff

Description

@davidanthoff

We have a crash from the VS Code extension that seems to originate from JuliaSyntax. Repo steps are:

  1. pkg> dev Sunny
  2. Run
using JuliaSyntax

src = read(joinpath(homedir(), ".julia/dev/Sunny/test/test_tensors.jl"), String)

tree = parseall(SyntaxNode, src, filename="foo.jl")

i = JuliaSyntax.last_byte(tree[2][end][end])

src[i]

That crashes with

ERROR: StringIndexError: invalid index [2142], valid nearby indices [2140]=>'′', [2143]=>'\r'

I assume that last_byte should always return a valid index, right? So this seems like a bug.

The rest of this issue is some speculation from my end, might all be wrong :)

I started looking a bit around in the JuliaSyntax code, and there seems to be a fair bit of code that assumes 1-byte characters, which strikes me as incorrect? Or maybe this is carefully only done when there is a guarantee that only 1-byte codepoints can appear? An example is

span(t::FullToken) = 1 + last_byte(t) - first_byte(t)
Doesn't that (incorrectly) assume that the character at index position last_byte is only one code unit? Generally if I search the repo for - 1 or + 1 I see a fair bit of code where I would have assumed that a prevind or nextind would be needed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions