Skip to content

Optimizations #114

@nsbgn

Description

@nsbgn

SPARQL can only go so far, so we probably need to be smarter to make this properly scalable. That said, for https://github.com/quangis/quangis-workflow, we need to get rid of memory errors asap, and so:

  • Be smarter about selecting on the bag-of-types by only keeping the most specific types and not having SPARQL unions unless absolutely necessary (implemented as of e0a1a7d, 30bc861, 499d255, but poorly thought through, poorly implemented and poorly tested). In essence, we're now removing some constraints that are already guaranteed to hold in the presence of other constraints.
  • Also eliminate pointless UNIONs in ordered data (671de1f, a258e93)
  • Order the bag-of-types such that the most specific types come first
  • Use subqueries to limit ordered queries
  • Seperate :contains predicates for types and operators, so that the search is more directed.
  • Annotating the transformation graphs as directly as possible (saving all supertypes of each conceptual step on the step itself), so that we can do ?workflow :containsType <A> and ?step :subtypeOf B instead of, respectively, ?workflow :containsType ?A. ?A rdfs:subClassOf* <A> and ?step :type ?B. ?B rdfs:subClassOf* <B>. This is the biggest improvement.
  • Every step in the transformation graph should record from which steps it is reachable/which steps it depends on. Then we don't need property paths and can just select on type, select on reachability, done. Should make for another huge improvement. (Note: if we were using trees, we could also record the "path" on each step, but that gets exponential for DAGs)
  • Given the above, we can drop steps that themselves depend on other steps that match.
  • Record distance from output on every step. That way, we can force breadth-first search even on SPARQL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions