optimizing queries with LIMIT or extremely large intermediate results

The following query from WDbench opts is silly, but it illustrates a place where QLever could be faster.  Because the OPTIONAL can only increase the size of the result, it is only necessary to create 10000000 results for the join in the main basic graph pattern.  What QLever used to do was to relatively quickly run out of memory for the join, which wasn't great because only a few of these results are needed to produce a final result.  With the recent changes QLever instead runs out of time, which is arguably worse.  If the limit was pushed down through the sort and onto the join, QLever would perform much better on this query, and also other queries where a LIMIT can be used to limit the number of intermediate results.

Even if LIMIT isn't pushed down, the size of the intermediate result is known, so a good cost, in both time and space, to compute it can be determined.  If either is significantly greater than the available resource, then the query could be aborted without actually performing the join.   Care does have to be taken here, as evidenced by Virtuoso's laughable time estimates.


```
SELECT * WHERE {
  ?x1 <http://www.wikidata.org/prop/direct/P106> <http://www.wikidata.org/entity/Q188094> .
  ?x2 <http://www.wikidata.org/prop/direct/P27> <http://www.wikidata.org/entity/Q155> .
  ?x3 <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q5> .
  OPTIONAL {
    ?x3 <http://www.wikidata.org/prop/direct/P21> ?x4 .
  }
}
LIMIT 10000000
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimizing queries with LIMIT or extremely large intermediate results #2697

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

optimizing queries with LIMIT or extremely large intermediate results #2697

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions