Skip to content

Question: proper way of matching unicode categories? #3

@ethindp

Description

@ethindp

So I'm writing an Ada parser (I've tried a few different tools) and I came across this (thank you, This Week in Rust) and I found it fascinating (talk about an inventive way of writing a parser). The Ada grammar uses Unicode GCs for matching identifiers. (It also has custom separators, some of which aren't in parsel::ast::token, so I'm unsure how to match those).
I have a couple options for doing this:

  1. I have a custom lexer already. But I'm unsure how I'd incorporate this into parsel. I could just use that.
  2. I could use embedded Unicode tables. (This would be rather annoying, however, since I'd have to expand it, which could get costly, very quickly.)

Are there any other options or suggestions that I could try? Or, if not, how could I accomplish the incorporation of my Lexer? (It defines a custom Token struct, but I could probably transform it into something that Parsel would like.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions