|
| 1 | += CIP2017-01-18 - Configurable Pattern Matching Semantics |
| 2 | +:numbered: |
| 3 | +:toc: |
| 4 | +:toc-placement: macro |
| 5 | +:source-highlighter: codemirror |
| 6 | + |
| 7 | +*Author:* Stefan Plantikow <stefan.plantikow@neotechnology.com> |
| 8 | + |
| 9 | +This proposal is a response to CIR-2017-174. |
| 10 | + |
| 11 | +== Motivation |
| 12 | + |
| 13 | +Currently Cypher uses pattern matching semantics that treats all patterns that occur in a `MATCH` clause as a unit (called a *uniqueness scope*) and only considers pattern instances that bind different relationships to each simple relationship pattern variable and to each element of a variable length relationship pattern variable. |
| 14 | +This has come to be called *cypermorphism* informally and is a variation of edge isomorphism. |
| 15 | + |
| 16 | +Cyphermorphism lies at the intersection of returning as many results as possible while still ruling out returning an infinite number of paths when matching graphs that contain cycles. |
| 17 | + |
| 18 | +However, the notion of *uniqueness scope* has proven to be non-standard and occasionally confusing for users and cyphermorphic matching is not tractable in terms of computational complexity for some graphs. |
| 19 | + |
| 20 | +The CIP aims to address these issues. |
| 21 | + |
| 22 | +== Background |
| 23 | + |
| 24 | +Each pattern consists of a comma separated list of *pattern parts*. |
| 25 | +Pattern parts are bound to a variable and consist of a linear chain of connected node and relationship patterns. |
| 26 | + |
| 27 | +Note that while Cypher allows omitting path, node, and relationship variables in a pattern this should only be considered as syntactic sugar, i.e. all parts of a pattern are always bound to a variable name from the viewpoint of pattern matching semantics (names are either provided in the query or automatically generated by a conforming implementation). |
| 28 | + |
| 29 | +== Proposal |
| 30 | + |
| 31 | +This CIP proposes to replace the notion of *uniqueness scope* and *cyphermorphism* and all associated rules by providing new, configurable pattern matching semantics for Cypher as outlined in this section. |
| 32 | + |
| 33 | +=== Rename PATH type |
| 34 | + |
| 35 | +This CIP proposes to rename the cypher type `PATH` to `WALK`. |
| 36 | + |
| 37 | +=== Definitions |
| 38 | + |
| 39 | +This CIP introduces the following kinds of walks: |
| 40 | + |
| 41 | +* `WALK`: A walk is an arbitrary, non-empty sequence of alternating nodes and relationships that starts with a node and ends with a node. |
| 42 | +* `TRAIL`: A trail is a walk that does not contain the same relationship twice. |
| 43 | +* `PATH`: A simple path is a trail that does not contain the same node twice unless that node is both the start node and the end node of the path. |
| 44 | + |
| 45 | +Note that every `PATH` is a `TRAIL` and that every `TRAIL` is a `WALK`. |
| 46 | + |
| 47 | +=== Pattern binder type |
| 48 | + |
| 49 | +This CIP proposes to name the variable of a pattern element of a pattern part to *pattern binder* in the grammar. |
| 50 | +Note that a pattern binder is always bound to a linear sequence of patterns of its pattern element. |
| 51 | + |
| 52 | +This CIP proposes introducing the notion of a *pattern binder type* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and limits the set of valid pattern instances that are considered as potential matches for the following pattern element: |
| 53 | + |
| 54 | +* `WALK` This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path patterns in the following pattern element |
| 55 | +* `TRAIL` This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path patterns in the following pattern element |
| 56 | +* `PATH` This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path patterns in the following pattern element |
| 57 | + |
| 58 | +The pattern binder type may be futher qualified with one of the following prefixes: |
| 59 | + |
| 60 | +* `OPEN WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are not the same node |
| 61 | +* `CLOSED WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are the same node |
| 62 | + |
| 63 | +The following additional pattern binder types are proposed to accomodate existing terminology that is commonly used in graph theory: |
| 64 | + |
| 65 | +* `CIRCUIT` is a synonym for `CLOSED TRAIL` |
| 66 | +* `CYCLE` is a synonym for `CLOSED PATH` |
| 67 | + |
| 68 | +Implementations are advised to signal a warning for every use of an `OPEN` pattern binder type if the two endpoints of the pattern element are both unbound and both use the same variable name. |
| 69 | + |
| 70 | +Implementations are advised to signal a warning for every use of an `CLOSED` pattern binder type if the two endpoints of the pattern element are both unbound and both use a different variable name. |
| 71 | + |
| 72 | +=== Pattern matching mode |
| 73 | + |
| 74 | +This CIP proposes introducing the notion of a *pattern matching mode* that may be writtern as a prefix to a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and applies to all pattern parts in that pattern. |
| 75 | + |
| 76 | +=== MATCH EVERY mode |
| 77 | + |
| 78 | +This CIP proposes the new `MATCH EVERY [WALK|TRAIL|PATH]` pattern matching mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern element. |
| 79 | +This may return an infinite or at least a very large result for some graphs. |
| 80 | + |
| 81 | +Implementations are advised to signal a warning for every use of `MATCH EVERY (OPEN|CLOSED) WALK` that may lead to the generation of an infinite result set. |
| 82 | + |
| 83 | +=== MATCH SHORTEST mode |
| 84 | + |
| 85 | +This CIP proposes the new `MATCH SHORTEST [WALK|TRAIL|PATH]` pattern matching mode that matches every _shortest_ walks (or trails, or paths respectively) as described by all node, relationship, and path patterns in the following pattern element. |
| 86 | + |
| 87 | +This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher. |
| 88 | + |
| 89 | +=== Weighting rules |
| 90 | + |
| 91 | +This CIP proposes that pattern elements may be further suffixed with a weighting rule of one of the following forms: |
| 92 | + |
| 93 | +* `WEIGHT r IN <aggregation> AS <weight>` Calculates a weight `<weight>` by evaluating the given `<aggregation>` for each relationship `r` in the currently matched walk |
| 94 | +* `WEIGHT |<expr>| AS <weight>` Calculates a weight `<weight>` by summing the results of evaluating `abs(<expr>)` for each relationship `r` in the currenlty matched walk in a special scope that only contains all properties of `r` as variables |
| 95 | + |
| 96 | +=== Product function |
| 97 | + |
| 98 | +To support a common family of weight calculations, this CIP proposes the introduction of a new aggregate function `product` for computing the product of a set of numbers. |
| 99 | + |
| 100 | +=== MATCH CHEAPEST mode |
| 101 | + |
| 102 | +This CIP proposes the new `MATCH CHEAPEST [WALK|TRAIL|PATH]` pattern matching mode that matches every cheapest walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern element and its concluding mandatory weighting rule that is prefixed with `BY`. |
| 103 | + |
| 104 | +The mandatory weighting rule may omit specifying an alias name for the computed weight and it's aggregation must be monotone (i.e. the sequence of intermediary results obtained by computing the aggregation incrementally over all input values in any order is always monotonically increasing). |
| 105 | + |
| 106 | +A conforming implementation is expected to raise a runtime error when the monotonicity of a monotone aggregation is violated at runtime. |
| 107 | + |
| 108 | +A conforming implementation may raise a compile time error when it can statically prove that the monotonicity of a monotone aggregation may be violated at runtime. |
| 109 | + |
| 110 | +=== Nondeterministic matching modes |
| 111 | + |
| 112 | +This CIP proposed using the keywords `SINGLE` and `FIRST` to modify pattern matching modes to return exactly zero or one arbitrarily chosen match. |
| 113 | + |
| 114 | +The supported forms are: |
| 115 | + |
| 116 | +* `MATCH SINGLE SHORTEST [WALK|TRAIL|PATH]` to return at most a single shortest walk (or trail, or path respectively) |
| 117 | +* `MATCH SINGLE CHEAPEST [WALK|TRAIL|PATH]` to return at most a single cheapest walk (or trail, or path respectively) |
| 118 | +* `MATCH FIRST [WALK|TRAIL|PATH]` to return at most a single arbitrary walk (pendent to `MATCH EVERY`) |
| 119 | +* `MATCH SINGLE [WALK|TRAIL|PATH]` to return at most a single walk (or trail, or path respectively) using default pattern matching semantics (defined below) |
| 120 | + |
| 121 | +=== Default pattern matching semantics |
| 122 | + |
| 123 | +It is proposed that a conforming implementation should provide a pre-parser option for defining the default pattern binder type for each pattern matching mode as well as the default pattern matching mode: |
| 124 | + |
| 125 | +* `match-every=walk|trail|path` for configuring the default pattern binder type for each use of the `MATCH EVERY` pattern matching mode |
| 126 | +* `match-shortest=walk|trail|path` for configuring the default pattern binder type for each use of the `MATCH SHORTEST` pattern matching mode |
| 127 | +* `match-cheapest=walk|trail|path` for configuring the default pattern binder type for each use of the `MATCH CHEAPEST` pattern matching mode |
| 128 | +* `match=every|shortest` for configuring the default pattern matching mode |
| 129 | + |
| 130 | +Using these pre-parser options, current Cypher pattern matching semantics closely correspond to `match-every=trail`, `match-shortest=trail`, `match=every` (except for the use of the uniqueness scope) |
| 131 | + |
| 132 | +This CIP proposes to change Cypher's default pattern matching semantics to `match-every=trail`, `match-shortest=walk`, `match-cheapest=walk`, `match=shortest`. |
| 133 | + |
| 134 | +=== Variable length patterns |
| 135 | + |
| 136 | +This CIP aligns with the introduction of path patterns by proposing that variable length patterns are to be deprecated in favor of path patterns. |
| 137 | + |
| 138 | +To simplify this migration and deprecation, this CIP proposes that any pattern element that contains a variable length pattern but no path pattern should match a `TRAIL` be default. |
| 139 | + |
| 140 | +=== Path predicates |
| 141 | + |
| 142 | +This CIP further proposes to introduce additional predicates and functions for working with walks |
| 143 | + |
| 144 | +* `open(p)`: true if the start and the end node of `p` are not the same node |
| 145 | +* `closed(p)`: true if the start and the end node of `p` are the same node |
| 146 | +* `trail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise |
| 147 | +* `path(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise |
| 148 | +* `circuit(p)`: `trail(p)`, if `closed(p)` is true, `NULL` otherwise |
| 149 | +* `cycle(p)`: `path(p)`, if `closed(p)` is true, `NULL` otherwise |
| 150 | +* `overlap(p1, p2)`: the shared walk between `p1` and `p2`, or `NULL` otherwise |
| 151 | +* `overlap(nodes(p1), nodes(p2))`: the shared walk between `nodes(p1)` and `nodes(p2)`, or `NULL` otherwise |
| 152 | +* `overlap(rels(p1), rels(p2))`: the shared walk between `rels(p1)` and `rels(p2)`, or `NULL` otherwise |
| 153 | +* `adjacent(p1, p2)`: true if `startNode(p1) IN [startNode(p2), endNode(p2)]` or `endNode(p1) IN [startNode(p2), endNode(p2)]` |
| 154 | +* `adjacent(nodes(p1), nodes(p2))`: true if `startNode(p1) IN [startNode(p2), endNode(p2)]` or `endNode(p1) IN [startNode(p2), endNode(p2)]` |
| 155 | +* `adjacent(rels(p1), rels(p2))`: true if `startNode(p1) IN [startNode(p2), endNode(p2)]` or `endNode(p1) IN [startNode(p2), endNode(p2)]` |
| 156 | +* `adjacent(r1, r2)`: true if `startNode(r1) IN [startNode(r2), endNode(r2)]` or `endNode(r1) IN [startNode(r2), endNode(r2)]` |
| 157 | +* `adjacent(n1, n2)`: true if `EXISTS (n1)-[]-(n2)` |
| 158 | + |
| 159 | +== Examples |
| 160 | + |
| 161 | +The following examples demonstrates various ways in which the newly proposed constructs may be used if this CIP is adopted. |
| 162 | + |
| 163 | +=== Matching shortest paths |
| 164 | + |
| 165 | +[source=cypher] |
| 166 | +---- |
| 167 | +// shortestPath(...) today becomes: |
| 168 | +MATCH SHORTEST TRAIL p=(a)-[r*]->(b) |
| 169 | +RETURN * LIMIT 1 |
| 170 | +
|
| 171 | +// allShortestPath(...) today becomes: |
| 172 | +MATCH SHORTEST TRAIL p=(a)-[r*]->(b) |
| 173 | +RETURN p |
| 174 | +---- |
| 175 | + |
| 176 | +=== Matching cheapest paths |
| 177 | + |
| 178 | +[source=cypher] |
| 179 | +---- |
| 180 | +MATCH CHEAPEST PATH p=(a)-/(:LOVES|:LIKES)*/->(b) BY WEIGHT |strength| AS w |
| 181 | +RETURN p AS paht, w AS weight |
| 182 | +---- |
| 183 | + |
| 184 | +=== Matching with existing semantics |
| 185 | + |
| 186 | +`overlap` may be used to express Cypher's current pattern matching semantics. |
| 187 | + |
| 188 | +[source=cypher] |
| 189 | +---- |
| 190 | +// Today (using same uniqueness scope for pat1, pat2, and pat) |
| 191 | +MATCH pat1=..., pat2=..., pat3=... |
| 192 | +
|
| 193 | +// This CIP |
| 194 | +MATCH EVERY pat1=... |
| 195 | +MATCH EVERY pat2=... WHERE length(overlap(pat1, pat2)) > 1 |
| 196 | +MATCH EVERY pat3=... |
| 197 | +WHERE |
| 198 | + length(overlap(pat1, pat3))>1 OR |
| 199 | + length(overlap(pat2, pat3))>1 |
| 200 | +) |
| 201 | +---- |
| 202 | + |
| 203 | +== Benefits to this proposal |
| 204 | + |
| 205 | +This proposal adds a generic facility to Cypher for expressing desired pattern matching semantics. |
| 206 | + |
| 207 | +== Caveats to this proposal |
| 208 | + |
| 209 | +A moderate increase in language complexity. |
| 210 | + |
| 211 | +A substantial departure from current pattern matching semantics. |
| 212 | +However, care has been taken to retain access to current semantics. |
| 213 | + |
| 214 | +`MATCH EVERY (OPEN|CLOSED) WALK` allows for non-terminating queries. |
0 commit comments