Flood-counting

One of the "deep" conceptual problems with LG parsing is that there are usually either not enough parses (i.e. null words) or there are too many (count overflow) and its hard to strike a middle ground. This particularly affects linkage generation: there (currently) isn't  any way to know the cost of a linkage, without actually computing it.

This issue is meant to be a place to daydream about possible solutions to this.

I have only one very vague and not really workable suggestion, for right now.  During counting, instead of counting 1 for each link, instead do a weighted count of $2^{-cost}$  for each link. 

Counting in this way replaces a grand-total by a weighted count, hinting at how many of these potential linkages are low-cost, vs. how many are high-cost. We could even make it smell more like simulated annealing by using $exp(-\beta \times cost)$ instead of $2^{-cost}$  where $\beta$ is a user-supplied "inverse temperature" parameter $\beta=1/kT$.

What is not yet clear to me is how to use this to replace the random sampling of linkages (when there is a count overflow) by a weighted random sampling, that would make lower-cost linkages more likely to be sampled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flood-counting #1451

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flood-counting #1451

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions