Skip to content

Commit 804b10e

Browse files
committed
Simplified set of match modes and default handling
1 parent 6117618 commit 804b10e

File tree

1 file changed

+112
-98
lines changed

1 file changed

+112
-98
lines changed

cip/1.accepted/CIP2017-01-18-configurable-pattern-matching-semantics.adoc

Lines changed: 112 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,60 @@ While Cypher allows omitting path, node, and relationship variables in a pattern
3030

3131
== Proposal
3232

33+
This CIP has been submitted in the belief that *CIP2017-02-06 Path Pattern Queries* will be accepted and is aligned with it.
34+
35+
=== Deprecations
36+
3337
This CIP proposes to replace the notion of *uniqueness scope* and *cyphermorphism* and all associated rules by providing new, configurable pattern matching semantics for Cypher as outlined in this section.
3438

35-
This CIP has been submitted in the belief that *CIP2017-02-06 Path Pattern Queries* will be accepted and is aligned with it.
39+
This CIP proposes to deprecate support for binding relationship list variables in variable length relationship patterns.
40+
41+
This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher.
42+
43+
44+
=== Basic pattern matching semantics
45+
46+
Each pattern consists of one or more top-level pattern parts that are given in a comma separated list.
47+
48+
[source=cypher]
49+
----
50+
MATCH (a)-->(b), (c)<--(d)
51+
RETURN *
52+
----
53+
54+
The solution (set of succesful matches) of a pattern is the cross product over the solutions of all it's top-level pattern parts, i.e. the above is the same as
55+
56+
[source=cypher]
57+
----
58+
MATCH (a)-->(b)
59+
// sequence of matches acts like a cross product:name: value
60+
// for each incoming row with a and b, find all matches (c)<--(d)
61+
MATCH (c)<--(d)
62+
RETURN *
63+
----
64+
65+
(ignoring uniqueness).
66+
67+
Binding any two node patterns, relationship patterns, or path patterns that are contained in the same pattern are bound to the same pattern variable describes an implicit join, i.e.
68+
69+
[source=cypher]
70+
----
71+
MATCH (a)-->()<--(a)
72+
RETURN a
73+
----
74+
75+
is semantically the same as
76+
77+
[source=cypher]
78+
----
79+
MATCH (n1)-->(n2), (n3)<--(n4) WHERE n1 = n4 AND n2 = n3
80+
RETURN v1 AS a
81+
----
82+
83+
=== Pattern binders
84+
85+
This CIP proposes to name the path variable that occurs before a pattern element of a pattern part to *pattern binder* in the grammar.
86+
Note that such variables are always bound to a linear sequence of node, relationship, and path patterns of its pattern element.
3687

3788
=== Walks
3889

@@ -46,27 +97,28 @@ Note that every `PATH` is a `TRAIL` and that every `TRAIL` is a `WALK`.
4697

4798
This CIP proposes to rename the cypher type `PATH` to `WALK`.
4899

49-
=== Pattern binders
50-
51-
This CIP proposes to name the path variable that occurs before a pattern element of a pattern part to *pattern binder* in the grammar.
52-
Note that such variables are always bound to a linear sequence of node, relationship, and path query patterns of its pattern element.
100+
=== Pattern binder class
53101

54102
This CIP proposes introducing the notion of a *pattern binder class* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element.
55-
The proposed pattern binder classes are:
103+
The proposed pattern binder classes in both singular and plural form are:
104+
105+
* `WALK` (plural: `WALKS`) This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path patterns given in the following pattern element.
106+
* `TRAIL` (plural: `TRAILS`) This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path patterns given in the following pattern element
107+
* `PATH` (plural: `PATHS`) This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path patterns given in the following pattern element
56108

57-
* `WALK` This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path query patterns given in the following pattern element
58-
* `TRAIL` This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path query patterns given in the following pattern element
59-
* `PATH` This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path query patterns given in the following pattern element
109+
This CIP proposes the default pattern binder class to be `WALK`.
60110

61111
The pattern binder class may be futher qualified with one of the following prefixes:
62112

63-
* `OPEN WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _not the same node_
64-
* `CLOSED WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _the same node_
113+
* `OPEN WALK[S]|TRAIL[S]|PATH[S]` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _not the same node_
114+
* `CLOSED WALK[S]|TRAIL[S]|PATH[S]` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _the same node_
65115

66116
The following additional pattern binder classes are proposed to accomodate existing terminology that is commonly used in graph theory:
67117

68118
* `CIRCUIT` is a synonym for `CLOSED TRAIL`
69119
* `CYCLE` is a synonym for `CLOSED PATH`
120+
* `CIRCUITS` is a synonym for `CLOSED TRAILS`
121+
* `CYCLES` is a synonym for `CLOSED PATHS`
70122

71123
Implementations are advised to signal a warning for every use of an `OPEN` pattern binder class if the two endpoints of the pattern element are both unbound and both use the same variable name.
72124

@@ -78,86 +130,61 @@ This CIP proposes introducing the notion of a *pattern match mode* that may be w
78130

79131
A pattern match mode is always written before any pattern binder class that has been explicitly given for the same pattern binder.
80132

81-
==== MATCH EVERY mode
82-
83-
This CIP proposes the new `MATCH EVERY` pattern match mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path query patterns given in the following pattern elements.
84-
This may return an infinite or at least a very large result for some graphs.
85-
86-
Implementations are advised to signal a warning for every use of `MATCH EVERY (OPEN|CLOSED) WALK` that may lead to the generation of an infinite result set.
87-
88-
==== MATCH SHORTEST mode
89-
90-
This CIP proposes the new `MATCH SHORTEST` pattern match mode that matches every _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path query patterns in the following pattern elements.
91-
92-
This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher.
93-
94-
==== Weight declarations
95-
96-
This CIP proposes that pattern elements may optionally be followed by weight declarations of one of the following forms:
97-
98-
* `WEIGHT <numerical-aggregation> OVER <rel> AS <weight>` Calculates a weight `<weight>` by evaluating the given `<numerical-aggregation>` for each relationship `<rel>` in the associated match
99-
* `WEIGHT |<expr>| AS <weight>` Calculates a weight `<weight>` by summing the results of evaluating `abs(<expr>)` for each relationship `r` in the associated match in a special scope that only contains all properties of `r` as variables
133+
==== Matching node patterns
100134

101-
Multiple weight declarations may be given as long as they do not define the same `<weight>` variable.
135+
A node pattern always matches all described nodes from the graph.
102136

103-
==== MATCH CHEAPEST mode
137+
Different pattern match modes do not influence the set of matched nodes.
104138

105-
This CIP proposes the new `MATCH CHEAPEST` pattern match mode that matches every cheapest walk (or trail, or path respectively) as described by all node, relationship, and path query patterns given in the following pattern element and according to the pattern element's concluding first _mandatory_ weight declaration.
139+
==== MATCH ALL mode
106140

107-
==== Mandatory weight declarations
141+
This CIP proposes the new `MATCH ALL` pattern match mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path patterns given in the following pattern elements.
108142

109-
A mandatory weight declaration is prefixed with `BY`, may omit specifying a variable name for the computed weight, and it's aggregation must be monotone (i.e. the sequence of intermediary results obtained by computing the aggregation incrementally over all input values in any order is always monotonically increasing).
143+
`MATCH ALL` may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`).
110144

111-
A conforming implementation is expected to raise a runtime error when the monotonicity of a mandatory weight declaration is violated at runtime.
145+
This CIP proposes that an error should be raised for any use of `MATCH ALL` without an explicit binder class in combination with variable length relationship or path patterns.
112146

113-
A conforming implementation may raise a compile time error when it can statically prove that the monotonicity of a mandatory weight declaration may be violated at runtime.
147+
Implementations are advised to signal a warning for any use of `MATCH ALL (OPEN|CLOSED) WALKS` that may return an infinite or prohibitively large result.
114148

115-
Additional weight declarations may be given after a mandatory weight declaration as long as no two weight declarations define conflicting aliases.
149+
==== MATCH ALL SHORTEST mode
116150

117-
==== Singular matches
151+
This CIP proposes the new `MATCH ALL SHORTEST` pattern match mode that matches every _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern elements.
118152

119-
This CIP proposes optionally prefixing pattern match modes and pattern binder classes with the `ONE [OF]` marker to support returning at most one match.
153+
`MATCH ALL SHORTEST` may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`).
120154

121-
=== Multiple pattern parts
122-
123-
If a pattern consists of multiple pattern parts, they are first solved independently before returning their cross product as the final result of the pattern.
155+
==== MATCH SHORTEST mode
124156

125-
=== Default pattern matching semantics
157+
This CIP proposes the new `MATCH SHORTEST` pattern match mode that matches one _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern elements.
126158

127-
This CIP defines three classes of pattern parts:
159+
`MATCH SHORTEST` may only be used in conjunction with a binder class in singular form (i.e. `WALK`, `TRAIL`, `PATH`).
128160

129-
* *Fixed length pattern parts* are top-level pattern parts that may consist of node patterns or single length relationship patterns only.
130-
* *Variable length pattern parts* are top-level pattern parts that may consist of node patterns, single length relationship patterns, or path query patterns only.
131-
* *Legacy variable length pattern parts* are top-level pattern parts that may consist of node patterns, single length relationship patterns, or path query patterns and contain at least one legacy variable length pattern (including chains of single length patterns expressed as bounded variable length patterns).
161+
=== Default MATCH mode
132162

133-
Current Cypher pattern matching semantics correspond to using `MATCH EVERY TRAIL` by default for all top-level pattern parts (i.e. `MATCH` behaves like `MATCH EVERY TRAIL`)
163+
This CIP proposes a new default pattern match mode that assigns a different pattern match mode to each type of pattern element:
134164

135-
This CIP proposes to adopt the following new default pattern match modes and default pattern binder classes:
165+
* Simple relationship patterns (e.g. `()-[]->()`) are to be matched using `MATCH ALL` (which is identical to `MATCH ALL SHORTEST` for simple relationship patterns)
166+
* Bounded variable length relationship patterns (e.g. `()-[*2..4]->()`) are to be matched using `MATCH ALL`
167+
* Unbounded variable length relationship patterns (e.g. `()-[*]->()`) are to be matched using `MATCH ALL`
168+
* Path patterns (e.g. `()-/../->()`) are to be matched using `MATCH ALL SHORTEST`
136169

137-
* `EVERY WALK` for fixed length pattern parts,
138-
* `SHORTEST WALK` for variable length pattern parts, and
139-
* `EVERY TRAIL` for legacy variable length pattern parts only.
170+
This CIP proposes that an error should be raised for any use of the default pattern match mode without an explicit binder class in combination with variable length relationship patterns.
140171

141-
This CIP aligns with the introduction of path query patterns by proposing that existing bounded and unbounded variable length patterns are to be deprecated in favor of path query patterns.
172+
The default pattern match mode may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`).
142173

143-
This changes Cypher to use homomorphic matching for all non-deprecated pattern parts.
174+
This changes Cypher to use homomorphic matching for simple relationship patterns.
144175

145176
=== Predicates and functions for working with walks
146177

147178
This CIP proposes to introduce additional predicates and functions for working with walks
148179

149-
* `open(p)`: true if the start node and the end node of `p` are not the same node
150-
* `closed(p)`: true if the start node and the end node of `p` are the same node
151-
* `trail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise
152-
* `path(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise
153-
* `circuit(p)`: `trail(p)`, if `closed(p)` is true, `NULL` otherwise
154-
* `cycle(p)`: `path(p)`, if `closed(p)` is true, `NULL` otherwise
180+
* `isOpen(p)`: true if the start node and the end node of `p` are not the same node
181+
* `isClosed(p)`: true if the start node and the end node of `p` are the same node
182+
* `toTrail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise
183+
* `toPath(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise
184+
* `toCircuit(p)`: return `toTrail(p)` if `closed(p)` is true, `NULL` otherwise
185+
* `toCycle(p)`: returns `toPath(p)` if `closed(p)` is true, `NULL` otherwise
155186
* `disjoint(list1, list2, ..., list_n)` is true if the lists do not share any elements
156187

157-
To support a common family of weight calculations, this CIP proposes the introduction of a new aggregate function `product` for computing the product of a set of numbers.
158-
159-
Evaluating `product` for an empty set returns `1`.
160-
161188
== Examples
162189

163190
The following examples demonstrates various ways in which the newly proposed constructs may be used if this CIP is adopted.
@@ -166,60 +193,47 @@ The following examples demonstrates various ways in which the newly proposed con
166193

167194
[source=cypher]
168195
----
169-
// shortestPath(...) today becomes:
170-
MATCH ONE SHORTEST [TRAIL] p=(a)-[r*]->(b)
196+
// MATCH p=shortestPath((a)-[:X*]->()) today becomes:
197+
MATCH SHORTEST TRAIL p=(a)-[:X*]->()
171198
RETURN *
172199
173-
// allShortestPaths(...) today becomes:
174-
MATCH SHORTEST [TRAIL] p=(a)-[r*]->(b)
175-
RETURN p
176-
----
177-
178-
=== Matching cheapest paths
179-
180-
[source=cypher]
181-
----
182-
MATCH CHEAPEST PATH p=(a)-/(:LOVES|:LIKES)*/->(b) BY WEIGHT |strength| AS w
183-
RETURN p AS path, w AS weight
184-
----
185-
186-
=== Matching one path and computing its weight
200+
// MATCH p=allShortestPaths((a)-[:X*]->()) today becomes:
201+
MATCH ALL SHORTEST TRAILS p=(a)-[:X*]->()
202+
RETURN *
187203
188-
[source=cypher]
189-
----
190-
MATCH ONE PATH p=(a)-[*]->(b) WEIGHT product(r.score+r.handicap) OVER r AS w
191-
RETURN p, w
204+
// MATCH p=allShortestPaths((a)-[:X*]->()) today using path patterns becomes:
205+
MATCH p=(a)-/:X*/->()
206+
RETURN *
192207
----
193208

194209
=== Matching with existing semantics
195210

196-
`overlap` may be used to precisely express Cypher's current pattern matching semantics.
211+
`disjoint` may be used to precisely express Cypher's current pattern matching semantics.
197212

198213
[source=cypher]
199214
----
200215
// Today (using same uniqueness scope for pat1, pat2, and pat)
201216
MATCH pat1=..., pat2=..., pat3=...
202217
203218
// This CIP
204-
MATCH EVERY TRAIL pat1=...
205-
MATCH EVERY TRAIL pat2=...
206-
MATCH EVERY TRAIL pat3=...
219+
MATCH pat1=...
220+
MATCH pat2=...
221+
MATCH pat3=...
207222
WHERE disjoint(rels(pat1), rels(pat2), rels(pat3))
208223
----
209224

210-
== Per-parser options
225+
== Pre-parser options
226+
227+
It is suggested that a conforming implementation should provide pre-parser options for defining the default pattern binder class as well as the default pattern match mode:
211228

212-
It is suggested that a conforming implementation should provide pre-parser options for defining the default pattern binder class for each pattern match mode as well as the default pattern match mode for each class of pattern parts:
229+
for each pattern match mode as well as the default pattern match mode for each class of pattern parts:
213230

214-
* `match-every=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH EVERY` pattern match mode
215-
* `match-shortest=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH SHORTEST` pattern match mode
216-
* `match-cheapest=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH CHEAPEST` pattern match mode
217-
* `fixlen-mode=every|shortest` for configuring the default pattern match mode of fixed length pattern parts
218-
* `varlen-mode=every|shortest` for configuring the default pattern match mode of variable length pattern parts
231+
* `binder-class=walk[s]|trail[s]|path[s]` for configuring a different default pattern binder class
232+
* `match-mode=all|all-shortest|shortest` for configuring a different default pattern match mode
219233

220234
== Benefits to this proposal
221235

222-
This proposal adds a generic facility to Cypher for expressing desired pattern matching semantics.
236+
This proposal adds a facility to Cypher for selecting from multiple desirable pattern matching semantics.
223237

224238
== Caveats to this proposal
225239

@@ -228,4 +242,4 @@ A moderate increase in language complexity.
228242
A substantial departure from current pattern matching semantics.
229243
However, care has been taken to retain access to current semantics.
230244

231-
`MATCH EVERY [OPEN|CLOSED] WALK` allows for non-terminating queries.
245+
`MATCH ALL [OPEN|CLOSED] WALKS` allows for non-terminating queries.

0 commit comments

Comments
 (0)