Optimizations

SPARQL can only go so far, so we probably need to be smarter to make this properly scalable. That said, for https://github.com/quangis/quangis-workflow, we need to get rid of memory errors asap, and so:

- [x] Be smarter about selecting on the bag-of-types by only keeping the most specific types and not having SPARQL unions unless absolutely necessary (implemented as of e0a1a7d66ca9f3ce84d2a0f6ea307747de59b979, 30bc861c14f4dfb40b0b260e72057084292e72dd, 499d2557f10ddda32d4761b8b81e85897acf008e, but poorly thought through, poorly implemented and poorly tested). In essence, we're now removing some constraints that are already guaranteed to hold in the presence of other constraints.
- [x] Also eliminate pointless UNIONs in ordered data ([671de1f](https://github.com/quangis/transforge/commit/671de1f5a6d5563520030065558e7490b6ef66b0), [a258e93](https://github.com/quangis/transforge/commit/a258e93c8849775e1be560866e37cc3396d9202d))
- [ ] Order the bag-of-types such that the most specific types come first
- [x] Use [subqueries](https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#subqueries) to limit ordered queries
- [x] Seperate `:contains` predicates for types and operators, so that the search is more directed.
- [x] Annotating the transformation graphs as directly as possible (saving all supertypes of each conceptual step on the step itself), so that we can do `?workflow :containsType <A>` and `?step :subtypeOf B` instead of, respectively, `?workflow :containsType ?A. ?A rdfs:subClassOf* <A>` and `?step :type ?B. ?B rdfs:subClassOf* <B>`. This is the biggest improvement.
- [x] Every step in the transformation graph should record from which steps it is reachable/which steps it depends on. Then we don't need property paths and can just select on type, select on reachability, done. Should make for another huge improvement. (Note: if we were using trees, we could also record the "path" on each step, but that gets exponential for DAGs)
- [ ] Given the above, we can drop steps that themselves depend on other steps that match.
- [ ] Record distance from output on every step. That way, we can force breadth-first search even on SPARQL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations #114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimizations #114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions