Skip to content

Order of ResultRows for SELECT * #166

@chiarcos

Description

@chiarcos

I might be missing something, but it seems that the order of ResultRows for a SELECT * query is randomized and changing. This is unexpected because other SPARQL engines I work with seem to generally apply the order of variables as they occur in the WHERE block.

Sample code:

query="SELECT * { ?a ?b ?c } LIMIT 10"
qres=g.query(query)
print(qres.vars)

Expected output:
[rdflib.term.Variable('a'), rdflib.term.Variable('b'), rdflib.term.Variable('c')]

Real output:
[rdflib.term.Variable('c'), rdflib.term.Variable('a'), rdflib.term.Variable('b')] (first run)
[rdflib.term.Variable('a'), rdflib.term.Variable('b'), rdflib.term.Variable('c')] (second run)
[rdflib.term.Variable('b'), rdflib.term.Variable('a'), rdflib.term.Variable('c')] (third run)
[rdflib.term.Variable('a'), rdflib.term.Variable('c'), rdflib.term.Variable('b')] (fourth run)
[you get the idea]

(Tested on the HDT edition of DBpedia 2016, created with g = rdflib.Graph(store=rdflib_hdt.HDTStore(rdf_file)), but that shouldn't matter.)

The application is that we run SPARQL queries whose number of variables isn't known in advance, that we return a binding for all variables and that the WHERE block (and the WHERE block only) is provided by the client. I could enforce a constant order by sorting keys (variables) lexicographically, but again, that order might be unexpected to the user as it changes depending on his naming preferences.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions