- Maybe we could also use the PHOG probabilities for generating syntactically valid strings within a language, and make them look more "real". Alternatively, we could randomly mangle the probability distribution for a symbol or even the entire grammar in a swarm-fuzzing style to generate plausible looking syntax trees.
- After generating an initial grammar, we could probably do several rounds of generating
strings and validating them with the reference parser to refine our grammar on strings
and trees that were not seen before by either one of them
- This will expose several examples that are either parsed differently by the reference parser, or that are considered invalid;
- Being able to absorb counter-examples will be very important for this step!
- Could we hone down exactly on the problematic subsections of the automaton? Or would this prevent us from finding the right answer on some occasions?
- If I learn a grammar without
repnodes, is that basically just a compressor for a set of strings? It feels almost like a finite-state transducer minus the transducer part? Could I implement a different set of metrics or goals to optimize for so I have something like a compressor?