Skip to content

optimizing queries with LIMIT or extremely large intermediate results #2697

@pfps

Description

@pfps

The following query from WDbench opts is silly, but it illustrates a place where QLever could be faster. Because the OPTIONAL can only increase the size of the result, it is only necessary to create 10000000 results for the join in the main basic graph pattern. What QLever used to do was to relatively quickly run out of memory for the join, which wasn't great because only a few of these results are needed to produce a final result. With the recent changes QLever instead runs out of time, which is arguably worse. If the limit was pushed down through the sort and onto the join, QLever would perform much better on this query, and also other queries where a LIMIT can be used to limit the number of intermediate results.

Even if LIMIT isn't pushed down, the size of the intermediate result is known, so a good cost, in both time and space, to compute it can be determined. If either is significantly greater than the available resource, then the query could be aborted without actually performing the join. Care does have to be taken here, as evidenced by Virtuoso's laughable time estimates.

SELECT * WHERE {
  ?x1 <http://www.wikidata.org/prop/direct/P106> <http://www.wikidata.org/entity/Q188094> .
  ?x2 <http://www.wikidata.org/prop/direct/P27> <http://www.wikidata.org/entity/Q155> .
  ?x3 <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q5> .
  OPTIONAL {
    ?x3 <http://www.wikidata.org/prop/direct/P21> ?x4 .
  }
}
LIMIT 10000000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions