Get rid of `null_count>0` parsing

The proposal, in short: get rid of all support for parsing with null words. It seems to be complicated, requiring extra checks and tests in all sorts of delicate location.  It adds complexity to the code. Perhaps it should be gotten rid of?

The  replacement is to extend/alter the ZZZ-link mechanism to handle the case of unlinkable words. It seems there are several ways to do this:

1) If there are no parses, then rebuild the expressions for the words, and automatically add `({XXX-} & {XXX+})` to every expression, and then re-parse.

2) Always add  `({XXX-} & {XXX+})[2.5]` to every expression, even from the very beginning. This would avoid the need for a second parse, in all situations: there would always be some linkage, no matter what.

Both  of these have problems with cost:
* Just adding `XXX` connectors with zero cost will distort parsing; we want to use them only as a last resort.
* Adding `XXX` with a high cost also distorts parsing: adding them with a cost of 2.6999 will prevent all parses that use any connector with greater than zero cost (given the English rejection of costs above 2.7)

Both of these *might*  (?) be solvable with a minor redesign of how   `cost-max` (aka `cost_cutoff`) actually works. This comment about Dennis's code: 

https://github.com/opencog/link-grammar/blob/4672e07635c0c1772e59f806862cd31c17df4d07/link-grammar/prepare/build-disjuncts.c#L204-L218

I think maybe Dennis had it right? That is, if we set:
```
c1->cost += e->cost;
c1->maxcost = MAX(c1->maxcost,e->cost);
```
then, yes, it will often be the case that max-cost will be less than cost. But that is OK. It will allow expressions such as `A-[1.6] & XXX+[2.5]`  to have a grand total cost of `4.1=1.6+2.5` and thus get ranked appropriately, while `maxcost` will be just 2.5 which is still below the `cost_cutoff` of 2.7, and thus gets included in the parse.  This would allow option 2) to be used, in place of null-counts.

Does this make sense? @ampli do you like this?  I think this would even be an easy experiment to make.

FWIW, I like this idea, because it is effectively what I am already doing in the atomese dict: either I am very certain of a disjunct (so it has a low cost) or I don't really know, so I assemble one from word pairs (at a higher cost) or I really don't know (e.g. unknown word) in which case a random high-cost link is allowed.

Long-term, there is another issue: the interplay with `cost-max` and combinatorial explosions. It would be nice (nicer) to have "flood-level" counting: slowly raise cost-max, until the count grows greater than zero.  But I don't know how to implement this efficiently/automatically (See however, issue #1451)

	/* c now points to the list of clauses */
	for (Clause *c1 = c; c1 != NULL; c1 = c1->next)
	{
	c1->cost += e->cost;
	/* c1->maxcost = MAX(c1->maxcost,e->cost); */
	/* Above is how Dennis had it. Someone changed it to below.
	* However, this can sometimes lead to a maxcost that is less
	* than the cost ! -- which seems wrong to me ... seems Dennis
	* had it right!?
	*/
	c1->maxcost += e->cost;
	/* Note: The above computation is used as a saving shortcut in
	* the inner loop of AND_type. If it is changed here, it needs to be
	* changed there too. */
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of `null_count>0` parsing #1449

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Get rid of null_count>0 parsing #1449

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Get rid of `null_count>0` parsing #1449