Skip to content

Commit 6117618

Browse files
committed
Rework CIP
1 parent ed1a824 commit 6117618

File tree

2 files changed

+231
-214
lines changed

2 files changed

+231
-214
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
= CIP2017-01-18 - Configurable Pattern Matching Semantics
2+
:numbered:
3+
:toc:
4+
:toc-placement: macro
5+
:source-highlighter: codemirror
6+
7+
*Author:* Stefan Plantikow <stefan.plantikow@neotechnology.com>
8+
9+
This proposal is a response to CIR-2017-174.
10+
11+
== Motivation
12+
13+
Currently Cypher uses pattern matching semantics that treats all patterns that occur in a `MATCH` clause as a unit (called a *uniqueness scope*) and only considers pattern instances that bind different relationships to each fixed length relationship pattern variable and to each element of a variable length relationship pattern variable.
14+
This has come to be called *cypermorphism* informally and is a variation of edge isomorphism.
15+
16+
Cyphermorphism lies at the intersection of returning as many results as possible while still ruling out returning an infinite number of paths when matching graphs that contain cycles.
17+
18+
However, the notion of *uniqueness scope* has proven to be non-standard and is occasionally confusing for users and cyphermorphic matching is not tractable in terms of computational complexity for some graphs.
19+
20+
The CIP aims to address these issues.
21+
22+
== Background
23+
24+
This CIP relies on the terminology introduced by the openCypher grammar.
25+
26+
Most notably, a pattern in Cypher consists of a comma separated list of *pattern parts*.
27+
Pattern parts may be bound to a path variable and consist of a linear chain of connected node and relationship patterns.
28+
29+
While Cypher allows omitting path, node, and relationship variables in a pattern this is just syntactic sugar, i.e. all parts of a pattern should be considered to be bound to a variable name from the viewpoint of pattern matching semantics (names are either provided in the query or automatically generated by a conforming implementation).
30+
31+
== Proposal
32+
33+
This CIP proposes to replace the notion of *uniqueness scope* and *cyphermorphism* and all associated rules by providing new, configurable pattern matching semantics for Cypher as outlined in this section.
34+
35+
This CIP has been submitted in the belief that *CIP2017-02-06 Path Pattern Queries* will be accepted and is aligned with it.
36+
37+
=== Walks
38+
39+
This CIP introduces the following kinds of walks:
40+
41+
* `WALK`: A walk is an arbitrary, non-empty sequence of alternating nodes and relationships that starts with a node and ends with a node.
42+
* `TRAIL`: A trail is a walk that does not contain the same relationship twice.
43+
* `PATH`: A simple path is a trail that does not contain the same node twice unless that node is both the start node and the end node of the path.
44+
45+
Note that every `PATH` is a `TRAIL` and that every `TRAIL` is a `WALK`.
46+
47+
This CIP proposes to rename the cypher type `PATH` to `WALK`.
48+
49+
=== Pattern binders
50+
51+
This CIP proposes to name the path variable that occurs before a pattern element of a pattern part to *pattern binder* in the grammar.
52+
Note that such variables are always bound to a linear sequence of node, relationship, and path query patterns of its pattern element.
53+
54+
This CIP proposes introducing the notion of a *pattern binder class* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element.
55+
The proposed pattern binder classes are:
56+
57+
* `WALK` This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path query patterns given in the following pattern element
58+
* `TRAIL` This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path query patterns given in the following pattern element
59+
* `PATH` This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path query patterns given in the following pattern element
60+
61+
The pattern binder class may be futher qualified with one of the following prefixes:
62+
63+
* `OPEN WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _not the same node_
64+
* `CLOSED WALK|TRAIL|PATH` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _the same node_
65+
66+
The following additional pattern binder classes are proposed to accomodate existing terminology that is commonly used in graph theory:
67+
68+
* `CIRCUIT` is a synonym for `CLOSED TRAIL`
69+
* `CYCLE` is a synonym for `CLOSED PATH`
70+
71+
Implementations are advised to signal a warning for every use of an `OPEN` pattern binder class if the two endpoints of the pattern element are both unbound and both use the same variable name.
72+
73+
Implementations are advised to signal a warning for every use of an `CLOSED` pattern binder class if the two endpoints of the pattern element are both unbound and both use a different variable name.
74+
75+
=== Pattern match modes
76+
77+
This CIP proposes introducing the notion of a *pattern match mode* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element.
78+
79+
A pattern match mode is always written before any pattern binder class that has been explicitly given for the same pattern binder.
80+
81+
==== MATCH EVERY mode
82+
83+
This CIP proposes the new `MATCH EVERY` pattern match mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path query patterns given in the following pattern elements.
84+
This may return an infinite or at least a very large result for some graphs.
85+
86+
Implementations are advised to signal a warning for every use of `MATCH EVERY (OPEN|CLOSED) WALK` that may lead to the generation of an infinite result set.
87+
88+
==== MATCH SHORTEST mode
89+
90+
This CIP proposes the new `MATCH SHORTEST` pattern match mode that matches every _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path query patterns in the following pattern elements.
91+
92+
This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher.
93+
94+
==== Weight declarations
95+
96+
This CIP proposes that pattern elements may optionally be followed by weight declarations of one of the following forms:
97+
98+
* `WEIGHT <numerical-aggregation> OVER <rel> AS <weight>` Calculates a weight `<weight>` by evaluating the given `<numerical-aggregation>` for each relationship `<rel>` in the associated match
99+
* `WEIGHT |<expr>| AS <weight>` Calculates a weight `<weight>` by summing the results of evaluating `abs(<expr>)` for each relationship `r` in the associated match in a special scope that only contains all properties of `r` as variables
100+
101+
Multiple weight declarations may be given as long as they do not define the same `<weight>` variable.
102+
103+
==== MATCH CHEAPEST mode
104+
105+
This CIP proposes the new `MATCH CHEAPEST` pattern match mode that matches every cheapest walk (or trail, or path respectively) as described by all node, relationship, and path query patterns given in the following pattern element and according to the pattern element's concluding first _mandatory_ weight declaration.
106+
107+
==== Mandatory weight declarations
108+
109+
A mandatory weight declaration is prefixed with `BY`, may omit specifying a variable name for the computed weight, and it's aggregation must be monotone (i.e. the sequence of intermediary results obtained by computing the aggregation incrementally over all input values in any order is always monotonically increasing).
110+
111+
A conforming implementation is expected to raise a runtime error when the monotonicity of a mandatory weight declaration is violated at runtime.
112+
113+
A conforming implementation may raise a compile time error when it can statically prove that the monotonicity of a mandatory weight declaration may be violated at runtime.
114+
115+
Additional weight declarations may be given after a mandatory weight declaration as long as no two weight declarations define conflicting aliases.
116+
117+
==== Singular matches
118+
119+
This CIP proposes optionally prefixing pattern match modes and pattern binder classes with the `ONE [OF]` marker to support returning at most one match.
120+
121+
=== Multiple pattern parts
122+
123+
If a pattern consists of multiple pattern parts, they are first solved independently before returning their cross product as the final result of the pattern.
124+
125+
=== Default pattern matching semantics
126+
127+
This CIP defines three classes of pattern parts:
128+
129+
* *Fixed length pattern parts* are top-level pattern parts that may consist of node patterns or single length relationship patterns only.
130+
* *Variable length pattern parts* are top-level pattern parts that may consist of node patterns, single length relationship patterns, or path query patterns only.
131+
* *Legacy variable length pattern parts* are top-level pattern parts that may consist of node patterns, single length relationship patterns, or path query patterns and contain at least one legacy variable length pattern (including chains of single length patterns expressed as bounded variable length patterns).
132+
133+
Current Cypher pattern matching semantics correspond to using `MATCH EVERY TRAIL` by default for all top-level pattern parts (i.e. `MATCH` behaves like `MATCH EVERY TRAIL`)
134+
135+
This CIP proposes to adopt the following new default pattern match modes and default pattern binder classes:
136+
137+
* `EVERY WALK` for fixed length pattern parts,
138+
* `SHORTEST WALK` for variable length pattern parts, and
139+
* `EVERY TRAIL` for legacy variable length pattern parts only.
140+
141+
This CIP aligns with the introduction of path query patterns by proposing that existing bounded and unbounded variable length patterns are to be deprecated in favor of path query patterns.
142+
143+
This changes Cypher to use homomorphic matching for all non-deprecated pattern parts.
144+
145+
=== Predicates and functions for working with walks
146+
147+
This CIP proposes to introduce additional predicates and functions for working with walks
148+
149+
* `open(p)`: true if the start node and the end node of `p` are not the same node
150+
* `closed(p)`: true if the start node and the end node of `p` are the same node
151+
* `trail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise
152+
* `path(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise
153+
* `circuit(p)`: `trail(p)`, if `closed(p)` is true, `NULL` otherwise
154+
* `cycle(p)`: `path(p)`, if `closed(p)` is true, `NULL` otherwise
155+
* `disjoint(list1, list2, ..., list_n)` is true if the lists do not share any elements
156+
157+
To support a common family of weight calculations, this CIP proposes the introduction of a new aggregate function `product` for computing the product of a set of numbers.
158+
159+
Evaluating `product` for an empty set returns `1`.
160+
161+
== Examples
162+
163+
The following examples demonstrates various ways in which the newly proposed constructs may be used if this CIP is adopted.
164+
165+
=== Matching shortest paths
166+
167+
[source=cypher]
168+
----
169+
// shortestPath(...) today becomes:
170+
MATCH ONE SHORTEST [TRAIL] p=(a)-[r*]->(b)
171+
RETURN *
172+
173+
// allShortestPaths(...) today becomes:
174+
MATCH SHORTEST [TRAIL] p=(a)-[r*]->(b)
175+
RETURN p
176+
----
177+
178+
=== Matching cheapest paths
179+
180+
[source=cypher]
181+
----
182+
MATCH CHEAPEST PATH p=(a)-/(:LOVES|:LIKES)*/->(b) BY WEIGHT |strength| AS w
183+
RETURN p AS path, w AS weight
184+
----
185+
186+
=== Matching one path and computing its weight
187+
188+
[source=cypher]
189+
----
190+
MATCH ONE PATH p=(a)-[*]->(b) WEIGHT product(r.score+r.handicap) OVER r AS w
191+
RETURN p, w
192+
----
193+
194+
=== Matching with existing semantics
195+
196+
`overlap` may be used to precisely express Cypher's current pattern matching semantics.
197+
198+
[source=cypher]
199+
----
200+
// Today (using same uniqueness scope for pat1, pat2, and pat)
201+
MATCH pat1=..., pat2=..., pat3=...
202+
203+
// This CIP
204+
MATCH EVERY TRAIL pat1=...
205+
MATCH EVERY TRAIL pat2=...
206+
MATCH EVERY TRAIL pat3=...
207+
WHERE disjoint(rels(pat1), rels(pat2), rels(pat3))
208+
----
209+
210+
== Per-parser options
211+
212+
It is suggested that a conforming implementation should provide pre-parser options for defining the default pattern binder class for each pattern match mode as well as the default pattern match mode for each class of pattern parts:
213+
214+
* `match-every=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH EVERY` pattern match mode
215+
* `match-shortest=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH SHORTEST` pattern match mode
216+
* `match-cheapest=walk|trail|path` for configuring the default pattern binder class for each use of the `MATCH CHEAPEST` pattern match mode
217+
* `fixlen-mode=every|shortest` for configuring the default pattern match mode of fixed length pattern parts
218+
* `varlen-mode=every|shortest` for configuring the default pattern match mode of variable length pattern parts
219+
220+
== Benefits to this proposal
221+
222+
This proposal adds a generic facility to Cypher for expressing desired pattern matching semantics.
223+
224+
== Caveats to this proposal
225+
226+
A moderate increase in language complexity.
227+
228+
A substantial departure from current pattern matching semantics.
229+
However, care has been taken to retain access to current semantics.
230+
231+
`MATCH EVERY [OPEN|CLOSED] WALK` allows for non-terminating queries.

0 commit comments

Comments
 (0)