Skip to content

Get rid of null_count>0 parsing #1449

@linas

Description

@linas

The proposal, in short: get rid of all support for parsing with null words. It seems to be complicated, requiring extra checks and tests in all sorts of delicate location. It adds complexity to the code. Perhaps it should be gotten rid of?

The replacement is to extend/alter the ZZZ-link mechanism to handle the case of unlinkable words. It seems there are several ways to do this:

  1. If there are no parses, then rebuild the expressions for the words, and automatically add ({XXX-} & {XXX+}) to every expression, and then re-parse.

  2. Always add ({XXX-} & {XXX+})[2.5] to every expression, even from the very beginning. This would avoid the need for a second parse, in all situations: there would always be some linkage, no matter what.

Both of these have problems with cost:

  • Just adding XXX connectors with zero cost will distort parsing; we want to use them only as a last resort.
  • Adding XXX with a high cost also distorts parsing: adding them with a cost of 2.6999 will prevent all parses that use any connector with greater than zero cost (given the English rejection of costs above 2.7)

Both of these might (?) be solvable with a minor redesign of how cost-max (aka cost_cutoff) actually works. This comment about Dennis's code:

/* c now points to the list of clauses */
for (Clause *c1 = c; c1 != NULL; c1 = c1->next)
{
c1->cost += e->cost;
/* c1->maxcost = MAX(c1->maxcost,e->cost); */
/* Above is how Dennis had it. Someone changed it to below.
* However, this can sometimes lead to a maxcost that is less
* than the cost ! -- which seems wrong to me ... seems Dennis
* had it right!?
*/
c1->maxcost += e->cost;
/* Note: The above computation is used as a saving shortcut in
* the inner loop of AND_type. If it is changed here, it needs to be
* changed there too. */
}

I think maybe Dennis had it right? That is, if we set:

c1->cost += e->cost;
c1->maxcost = MAX(c1->maxcost,e->cost);

then, yes, it will often be the case that max-cost will be less than cost. But that is OK. It will allow expressions such as A-[1.6] & XXX+[2.5] to have a grand total cost of 4.1=1.6+2.5 and thus get ranked appropriately, while maxcost will be just 2.5 which is still below the cost_cutoff of 2.7, and thus gets included in the parse. This would allow option 2) to be used, in place of null-counts.

Does this make sense? @ampli do you like this? I think this would even be an easy experiment to make.

FWIW, I like this idea, because it is effectively what I am already doing in the atomese dict: either I am very certain of a disjunct (so it has a low cost) or I don't really know, so I assemble one from word pairs (at a higher cost) or I really don't know (e.g. unknown word) in which case a random high-cost link is allowed.

Long-term, there is another issue: the interplay with cost-max and combinatorial explosions. It would be nice (nicer) to have "flood-level" counting: slowly raise cost-max, until the count grows greater than zero. But I don't know how to implement this efficiently/automatically (See however, issue #1451)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions