|
| 1 | += CIP2017-01-18 - Configurable Pattern Matching Semantics |
| 2 | +:numbered: |
| 3 | +:toc: |
| 4 | +:toc-placement: macro |
| 5 | +:source-highlighter: codemirror |
| 6 | + |
| 7 | +*Author:* Stefan Plantikow <stefan.plantikow@neotechnology.com> |
| 8 | + |
| 9 | +This proposal is a response to CIR-2017-174. |
| 10 | + |
| 11 | +== Motivation |
| 12 | + |
| 13 | +Currently Cypher uses pattern matching semantics that treats all patterns that occur in a `MATCH` clause as a unit (called a *uniqueness scope*) and only considers pattern instances that bind different relationships to each fixed length relationship pattern variable and to each element of a variable length relationship pattern variable. |
| 14 | +This has come to be called *cypermorphism* informally and is a variation of edge isomorphism. |
| 15 | + |
| 16 | +Cyphermorphism lies at the intersection of returning as many results as possible while still ruling out returning an infinite number of paths when matching graphs that contain cycles. |
| 17 | + |
| 18 | +However, the notion of *uniqueness scope* has proven to be non-standard and is occasionally confusing for users and cyphermorphic matching is not tractable in terms of computational complexity for some graphs. |
| 19 | + |
| 20 | +The CIP aims to address these issues. |
| 21 | + |
| 22 | +== Background |
| 23 | + |
| 24 | +This CIP relies on the terminology introduced by the openCypher grammar. |
| 25 | + |
| 26 | +Most notably, a pattern in Cypher consists of a comma separated list of *pattern parts*. |
| 27 | +Pattern parts may be bound to a path variable and consist of a linear chain of connected node and relationship patterns. |
| 28 | + |
| 29 | +While Cypher allows omitting path, node, and relationship variables in a pattern this is just syntactic sugar, i.e. all parts of a pattern should be considered to be bound to a variable name from the viewpoint of pattern matching semantics (names are either provided in the query or automatically generated by a conforming implementation). |
| 30 | + |
| 31 | +== Proposal |
| 32 | + |
| 33 | +This CIP proposes to replace the notion of *uniqueness scope* and *cyphermorphism* and all associated rules by providing new, configurable pattern matching semantics for Cypher as outlined in this section. |
| 34 | + |
| 35 | +This CIP has been submitted in the belief that *CIP2017-02-06 Path Pattern Queries* will be accepted and is aligned with it. |
| 36 | + |
| 37 | +=== Walks |
| 38 | + |
| 39 | +This CIP introduces the following kinds of walks: |
| 40 | + |
| 41 | +* `WALK`: A walk is an arbitrary, non-empty sequence of alternating nodes and relationships that starts with a node and ends with a node. |
| 42 | +* `TRAIL`: A trail is a walk that does not contain the same relationship twice. |
| 43 | +* `PATH`: A simple path is a trail that does not contain the same node twice unless that node is both the start node and the end node of the path. |
| 44 | + |
| 45 | +Note that every `PATH` is a `TRAIL` and that every `TRAIL` is a `WALK`. |
| 46 | + |
| 47 | +This CIP proposes to rename the cypher type `PATH` to `WALK`. |
| 48 | + |
| 49 | +=== Pattern binders |
| 50 | + |
| 51 | +This CIP proposes to name the path variable that occurs before a pattern element of a pattern part to *pattern binder* in the grammar. |
| 52 | +Note that such variables are always bound to a linear sequence of node, relationship, and path query patterns of its pattern element. |
| 53 | + |
| 54 | +This CIP proposes introducing the notion of a *pattern binder class* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element. |
| 55 | +The proposed pattern binder classes are: |
| 56 | + |
| 57 | +* `WALK` This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path query patterns given in the following pattern element |
| 58 | +* `TRAIL` This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path query patterns given in the following pattern element |
| 59 | +* `PATH` This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path query patterns given in the following pattern element |
| 60 | + |
| 61 | +The pattern binder class may be futher qualified with one of the following prefixes: |
| 62 | + |
| 63 | +* `OPEN WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _not the same node_ |
| 64 | +* `CLOSED WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _the same node_ |
| 65 | + |
| 66 | +The following additional pattern binder classes are proposed to accomodate existing terminology that is commonly used in graph theory: |
| 67 | + |
| 68 | +* `CIRCUIT` is a synonym for `CLOSED TRAIL` |
| 69 | +* `CYCLE` is a synonym for `CLOSED PATH` |
| 70 | + |
| 71 | +Implementations are advised to signal a warning for every use of an `OPEN` pattern binder class if the two endpoints of the pattern element are both unbound and both use the same variable name. |
| 72 | + |
| 73 | +Implementations are advised to signal a warning for every use of an `CLOSED` pattern binder class if the two endpoints of the pattern element are both unbound and both use a different variable name. |
| 74 | + |
| 75 | +=== Pattern match modes |
| 76 | + |
| 77 | +This CIP proposes introducing the notion of a *pattern match mode* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element. |
| 78 | + |
| 79 | +A pattern match mode is always written before any pattern binder class that has been explicitly given for the same pattern binder. |
| 80 | + |
| 81 | +==== MATCH EVERY mode |
| 82 | + |
| 83 | +This CIP proposes the new `MATCH EVERY` pattern match mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path query patterns given in the following pattern elements. |
| 84 | +This may return an infinite or at least a very large result for some graphs. |
| 85 | + |
| 86 | +Implementations are advised to signal a warning for every use of `MATCH EVERY (OPEN|CLOSED) WALK` that may lead to the generation of an infinite result set. |
| 87 | + |
| 88 | +==== MATCH SHORTEST mode |
| 89 | + |
| 90 | +This CIP proposes the new `MATCH SHORTEST` pattern match mode that matches every _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path query patterns in the following pattern elements. |
| 91 | + |
| 92 | +This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher. |
| 93 | + |
| 94 | +==== Weight declarations |
| 95 | + |
| 96 | +This CIP proposes that pattern elements may optionally be followed by weight declarations of one of the following forms: |
| 97 | + |
| 98 | +* `WEIGHT <numerical-aggregation> OVER <rel> AS <weight>` Calculates a weight `<weight>` by evaluating the given `<numerical-aggregation>` for each relationship `<rel>` in the associated match |
| 99 | +* `WEIGHT |<expr>| AS <weight>` Calculates a weight `<weight>` by summing the results of evaluating `abs(<expr>)` for each relationship `r` in the associated match in a special scope that only contains all properties of `r` as variables |
| 100 | + |
| 101 | +Multiple weight declarations may be given as long as they do not define the same `<weight>` variable. |
| 102 | + |
| 103 | +==== MATCH CHEAPEST mode |
| 104 | + |
| 105 | +This CIP proposes the new `MATCH CHEAPEST` pattern match mode that matches every cheapest walk (or trail, or path respectively) as described by all node, relationship, and path query patterns given in the following pattern element and according to the pattern element's concluding first _mandatory_ weight declaration. |
| 106 | + |
| 107 | +==== Mandatory weight declarations |
| 108 | + |
| 109 | +A mandatory weight declaration is prefixed with `BY`, may omit specifying a variable name for the computed weight, and it's aggregation must be monotone (i.e. the sequence of intermediary results obtained by computing the aggregation incrementally over all input values in any order is always monotonically increasing). |
| 110 | + |
| 111 | +A conforming implementation is expected to raise a runtime error when the monotonicity of a mandatory weight declaration is violated at runtime. |
| 112 | + |
| 113 | +A conforming implementation may raise a compile time error when it can statically prove that the monotonicity of a mandatory weight declaration may be violated at runtime. |
| 114 | + |
| 115 | +Additional weight declarations may be given after a mandatory weight declaration as long as no two weight declarations define conflicting aliases. |
| 116 | + |
| 117 | +==== Singular matches |
| 118 | + |
| 119 | +This CIP proposes optionally prefixing pattern match modes and pattern binder classes with the `ONE [OF]` marker to support returning at most one match. |
| 120 | + |
| 121 | +=== Multiple pattern parts |
| 122 | + |
| 123 | +If a pattern consists of multiple pattern parts, they are first solved independently before returning their cross product as the final result of the pattern. |
| 124 | + |
| 125 | +=== Default pattern matching semantics |
| 126 | + |
| 127 | +This CIP defines three classes of pattern parts: |
| 128 | + |
| 129 | +* *Fixed length pattern parts* are top-level pattern parts that may consist of node patterns or single length relationship patterns only. |
| 130 | +* *Variable length pattern parts* are top-level pattern parts that may consist of node patterns, single length relationship patterns, or path query patterns only. |
| 131 | +* *Legacy variable length pattern parts* are top-level pattern parts that may consist of node patterns, single length relationship patterns, or path query patterns and contain at least one legacy variable length pattern (including chains of single length patterns expressed as bounded variable length patterns). |
| 132 | + |
| 133 | +Current Cypher pattern matching semantics correspond to using `MATCH EVERY TRAIL` by default for all top-level pattern parts (i.e. `MATCH` behaves like `MATCH EVERY TRAIL`) |
| 134 | + |
| 135 | +This CIP proposes to adopt the following new default pattern match modes and default pattern binder classes: |
| 136 | + |
| 137 | +* `EVERY WALK` for fixed length pattern parts, |
| 138 | +* `SHORTEST WALK` for variable length pattern parts, and |
| 139 | +* `EVERY TRAIL` for legacy variable length pattern parts only. |
| 140 | + |
| 141 | +This CIP aligns with the introduction of path query patterns by proposing that existing bounded and unbounded variable length patterns are to be deprecated in favor of path query patterns. |
| 142 | + |
| 143 | +This changes Cypher to use homomorphic matching for all non-deprecated pattern parts. |
| 144 | + |
| 145 | +=== Predicates and functions for working with walks |
| 146 | + |
| 147 | +This CIP proposes to introduce additional predicates and functions for working with walks |
| 148 | + |
| 149 | +* `open(p)`: true if the start node and the end node of `p` are not the same node |
| 150 | +* `closed(p)`: true if the start node and the end node of `p` are the same node |
| 151 | +* `trail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise |
| 152 | +* `path(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise |
| 153 | +* `circuit(p)`: `trail(p)`, if `closed(p)` is true, `NULL` otherwise |
| 154 | +* `cycle(p)`: `path(p)`, if `closed(p)` is true, `NULL` otherwise |
| 155 | +* `disjoint(list1, list2, ..., list_n)` is true if the lists do not share any elements |
| 156 | + |
| 157 | +To support a common family of weight calculations, this CIP proposes the introduction of a new aggregate function `product` for computing the product of a set of numbers. |
| 158 | + |
| 159 | +Evaluating `product` for an empty set returns `1`. |
| 160 | + |
| 161 | +== Examples |
| 162 | + |
| 163 | +The following examples demonstrates various ways in which the newly proposed constructs may be used if this CIP is adopted. |
| 164 | + |
| 165 | +=== Matching shortest paths |
| 166 | + |
| 167 | +[source=cypher] |
| 168 | +---- |
| 169 | +// shortestPath(...) today becomes: |
| 170 | +MATCH ONE SHORTEST [TRAIL] p=(a)-[r*]->(b) |
| 171 | +RETURN * |
| 172 | +
|
| 173 | +// allShortestPaths(...) today becomes: |
| 174 | +MATCH SHORTEST [TRAIL] p=(a)-[r*]->(b) |
| 175 | +RETURN p |
| 176 | +---- |
| 177 | + |
| 178 | +=== Matching cheapest paths |
| 179 | + |
| 180 | +[source=cypher] |
| 181 | +---- |
| 182 | +MATCH CHEAPEST PATH p=(a)-/(:LOVES|:LIKES)*/->(b) BY WEIGHT |strength| AS w |
| 183 | +RETURN p AS path, w AS weight |
| 184 | +---- |
| 185 | + |
| 186 | +=== Matching one path and computing its weight |
| 187 | + |
| 188 | +[source=cypher] |
| 189 | +---- |
| 190 | +MATCH ONE PATH p=(a)-[*]->(b) WEIGHT product(r.score+r.handicap) OVER r AS w |
| 191 | +RETURN p, w |
| 192 | +---- |
| 193 | + |
| 194 | +=== Matching with existing semantics |
| 195 | + |
| 196 | +`overlap` may be used to precisely express Cypher's current pattern matching semantics. |
| 197 | + |
| 198 | +[source=cypher] |
| 199 | +---- |
| 200 | +// Today (using same uniqueness scope for pat1, pat2, and pat) |
| 201 | +MATCH pat1=..., pat2=..., pat3=... |
| 202 | +
|
| 203 | +// This CIP |
| 204 | +MATCH EVERY TRAIL pat1=... |
| 205 | +MATCH EVERY TRAIL pat2=... |
| 206 | +MATCH EVERY TRAIL pat3=... |
| 207 | +WHERE disjoint(rels(pat1), rels(pat2), rels(pat3)) |
| 208 | +---- |
| 209 | + |
| 210 | +== Per-parser options |
| 211 | + |
| 212 | +It is suggested that a conforming implementation should provide pre-parser options for defining the default pattern binder class for each pattern match mode as well as the default pattern match mode for each class of pattern parts: |
| 213 | + |
| 214 | +* `match-every=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH EVERY` pattern match mode |
| 215 | +* `match-shortest=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH SHORTEST` pattern match mode |
| 216 | +* `match-cheapest=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH CHEAPEST` pattern match mode |
| 217 | +* `fixlen-mode=every|shortest` for configuring the default pattern match mode of fixed length pattern parts |
| 218 | +* `varlen-mode=every|shortest` for configuring the default pattern match mode of variable length pattern parts |
| 219 | + |
| 220 | +== Benefits to this proposal |
| 221 | + |
| 222 | +This proposal adds a generic facility to Cypher for expressing desired pattern matching semantics. |
| 223 | + |
| 224 | +== Caveats to this proposal |
| 225 | + |
| 226 | +A moderate increase in language complexity. |
| 227 | + |
| 228 | +A substantial departure from current pattern matching semantics. |
| 229 | +However, care has been taken to retain access to current semantics. |
| 230 | + |
| 231 | +`MATCH EVERY [OPEN|CLOSED] WALK` allows for non-terminating queries. |
0 commit comments