Replace regex parser with context free grammar parser

If you've looked at the offending code behind #79 , you'll see that the almagamation of regexes looks like an unmaintainable mess. Especially string-replacement in regex-strings and other combination mechanisms seem dangerous.

Now python regexes are not combinable by usual regex operation and I don't know of a performant regex engine that allows that.

A maintainable alternative could be using a parser/tokenizer combination out of the realm of context-free grammars. In my experience, they can be combined more easily and are therefore more maintainable. The result is a tree instead of a list, which is no problem. Some Tree-nodes we would replace by data constructors (like creating decimals from a comma-seperator and two number-nodes) and other parts of the tree would be flattened to a list, basically as it is now.

I think I have also more confidence in me/us writing/reading the simple regexes for tokens and a context-free grammar than whole regexes for complex expressions that span multiple lines and are later changed through string-processing functions.

In the longer run we could make chomsky proud and even supply some general CFGs for the supported languages and identify verbs, adjectives, and nouns through the grammar, giving us more confidence in extracting recipe steps.

I know it's not a priority but is this something we would put on the agenda?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace regex parser with context free grammar parser #80

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Replace regex parser with context free grammar parser #80

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions