Skip to content

Yoplitein/nbnf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nbnf

A parser generator based on nom, with syntax inspired by EBNF and regex.

Syntax overview

A grammar is a series of rules containing expressions. Whitespace is insignificant, C-like comments with nesting are allowed, rules must end with a semicolon:

// foo
rule = ...; // bar
rule2 =
    /* /*
        baz
        qux
    */ */
    ...
    ...;
...

A rule generates a parser function as Rust code, and so its name must be a valid Rust identifier. The output type of the generated function can be specified, defaulting to &str if omitted:

rule<Output> = ...;

Any valid Rust code denoting a type is permitted between the chevrons.

The input type can be specified as well, also defaulting to &str, but requires output to also be specified:

rule<Input><Output> = ...;

Expressions can invoke any parser function defined in Rust, with other rules simply being resolved as symbols in the same enclosing module:

top = inner external_rule nbnf::nom::combinator::eof;
inner = ...;

A literal Rust expression can also be inserted, e.g. to invoke parametric parsers:

two_chars = <nbnf::nom::bytes::complete::take(2usize)>;

Rules can match literal chars, strings, or regex-like character ranges; and supports Rust-like escapes:

top = 'a' "bc" [de-g] '\x2A' "\"\0\r\n\t\x7F\u{FF}";

Expressions can be grouped with parentheses, and alternated between with slash:

top = ('a' 'b') / ('c' 'd');

Expressions can be repeated with regex-like syntax:

r1 = 'a'?;      // zero or one
r1 = 'b'*;      // zero or more
r2 = 'c'+;      // one or more
r3 = 'd'{2};    // exactly two
r4 = 'e'{2,};   // at least two
r5 = 'f'{,2};   // at most two
r6 = 'g'{2,4};  // between two to four

Expressions can be tagged with various modifiers, wrapping them in combinators:

  • !! (cut) prevents backtracking, e.g. when you know no other expressions can match
json_object_pair<(String, Json)> = string !!(-':' json_value);
  • ! (not) matches only when the expression does not match, consuming no input
ident = -![0-9] ~[a-zA-Z0-9_]+;
  • ~ (recognize) will discard the output and instead yield the portion of the input that was matched
r1<(i32, f64)> = ...;
r2<&str> = ~r1;

Expressions can be discarded from output by prefixing them with -:

string<&str> = -'"' ~(string_char+) -'"'

For this particular grammar, foregoing the discards would require a tuple as the return type because the quote chars are included:

string<(char, &str, char)> = ...;

The empty string can be matched with &, allowing various interesting grammar constructs:

parens = ~('(' parens ')') / ~&;

Types and output values can be massaged in a few ways by passing any valid Rust expression:

  • @<...> (value) discards output and instead returns the given literal
token<Token> =
    ... /
    '/'@<Token::Slash> /
    ...;
  • |<...> (map) runs a mapping function over the output
object<HashMap> =
    -'{' object_pair+ -'}'
    |<HashMap::from_iter>;
  • |?<...> (map_opt) runs a mapping function returning Option over the output
even_int<i32> =
    int
    |?<|v| (v & 1 == 0).then_some(v)>;
  • |!<...> (map_res) runs a mapping function returning Result over the output
number<i32> =
    ~([0-9]+)
    |!<i32::from_str>
  • ||<...> (no corresponding nom combinator) wraps the expression in arbitrary Rust code, which should contain a placeholder $expr (explained below)
comma = ",";
pairs =
    ("foo" "bar")
    ||<nbnf::nom::multi::separated_list1(comma, $expr)>;

Certain behavior can be modified with pragma directives:

  • #input <ty> allows specifying the default input type of all following rules
#input <&[u8]>
binary_rule<()> = b"foo"@<()>;
  • #output <ty> similarly allows specifying the default output type
  • #error <ty> allows setting the error type passed to IResult, e.g. to use VerboseError
// note that the type should not include generics, the input type is substituted per-rule
#error <nom_language::error::VerboseError>

rule = ...;
// generates `fn rule(input: &str) -> IResult<&str, &str, VerboseError<&str>>`
  • #placeholder <name> <expr> allows defining new placeholders (explained below), and overriding those built into nbnf
#placeholder myparsers my_lib::parsers
rule = $myparsers::parser;

Each pragma also allows clearing user-defined values:

// default input type is reset to `&str`
#input $reset
// likewise for `#output`/`#error`

// placeholder `foo` is reset (to default, if any)
#placeholder foo $reset

// all user-defined placeholders are reset
#placeholder $reset

Placeholders are syntax that allow arbitrary substitutions. nbnf has a few predefined placeholders that can be overridden to alter generated parsers:

  • $nom defaults to nbnf::nom, and is used by the generator to qualify foundational parsers. Overriding can be used to e.g. swap nom out for winnow
#placeholder nom winnow
// subsequent parsers now use winnow
  • $complete_or_streaming defaults to complete and is used by the generator to qualify foundational parsers that come in complete or streaming variants (see nom docs for more info)
  • $expr is only defined in the wrapping code of wrap syntax (||<...>) and expands to the expression being wrapped
rule = inner||<foo($expr)>; // expands to `foo(inner)`

Example Usage

The main entrypoint is nbnf::nbnf, a proc macro that expands to parsers generated from the given grammar. Note that the input must be passed as a string (preferably a raw string,) as certain expressions which are valid grammars are invalid Rust (e.g. the unbalanced quote in [^"].)

use nbnf::nbnf;

nbnf!(r#"
    top = ~('a' top 'b') / ~&;
"#);

fn main() {
    let input = "aabbc";
    let (rest, output) = top.parse(input).unwrap();
    assert_eq!(rest, "c");
    assert_eq!(output, "aabb");
}

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages