Skip to content

Replace regex parser with context free grammar parser #80

@maweki

Description

@maweki

If you've looked at the offending code behind #79 , you'll see that the almagamation of regexes looks like an unmaintainable mess. Especially string-replacement in regex-strings and other combination mechanisms seem dangerous.

Now python regexes are not combinable by usual regex operation and I don't know of a performant regex engine that allows that.

A maintainable alternative could be using a parser/tokenizer combination out of the realm of context-free grammars. In my experience, they can be combined more easily and are therefore more maintainable. The result is a tree instead of a list, which is no problem. Some Tree-nodes we would replace by data constructors (like creating decimals from a comma-seperator and two number-nodes) and other parts of the tree would be flattened to a list, basically as it is now.

I think I have also more confidence in me/us writing/reading the simple regexes for tokens and a context-free grammar than whole regexes for complex expressions that span multiple lines and are later changed through string-processing functions.

In the longer run we could make chomsky proud and even supply some general CFGs for the supported languages and identify verbs, adjectives, and nouns through the grammar, giving us more confidence in extracting recipe steps.

I know it's not a priority but is this something we would put on the agenda?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions