Skip to content

Commit b119da3

Browse files
authored
Fixes #1064: Add procedure to compare graphs (#3041) (#4443) (#4445)
* Fixes #1064: Add procedure to compare graphs (#3041) (#4443) * Fixes #1064: Add procedure to compare graphs (#3041) * Fixes #1064: Add procedure to compare graphs * added extended tag * added multi-db support, tests and adoc * changed VirtualNode with Node * fix ci errors * fixed invalid apoc.meta.Meta import * fix MapSubGraph classes * changed MapSubgraph implementation * fix tests * fix test CI * fix tests - unboxed issue * fix wrong imports * fix test * fix compile errors
1 parent 5f31b77 commit b119da3

File tree

15 files changed

+1348
-43
lines changed

15 files changed

+1348
-43
lines changed

docs/asciidoc/modules/ROOT/nav.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ include::partial$generated-documentation/nav.adoc[]
5454
** xref::graph-updates/ttl.adoc[]
5555
** xref::graph-updates/graph-generators.adoc[]
5656
57+
* xref:comparing-graphs/index.adoc[]
58+
** xref::comparing-graphs/graph-difference.adoc[]
5759
5860
* xref:cypher-execution/index.adoc[]
5961
** xref::cypher-execution/running-cypher.adoc[]
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
== `apoc.diff.graphs` procedure
2+
3+
The procedure accepts 2 string argument, the `source` and the `dest`, representing 2 queries to compare,
4+
and an optional `config` map as a 3rd parameter.
5+
6+
The procedure compares the `source` and `dest` two graphs and returns the differences in terms of:
7+
8+
* same node count
9+
* same count per label
10+
* same relationship counter
11+
* same count per rel-type
12+
13+
For each node in the `source` graph with a certain label, find the same node (in the `dest` graph) by keys or internal id in the other graph and if found:
14+
* compare all labels
15+
* compare all properties
16+
17+
Please note that the node finding leverage the existing constraint to find an equivalent node. To find a node using the internal id you can use `findById:true` config (see below).
18+
19+
For each relationship in the `source` graph we'll get the two nodes of the relationship and look into the `dest`
20+
graph if there is a relationship with the same properties and the same start/end node.
21+
22+
23+
The procedure support the following `config` parameters:
24+
25+
.config parameters
26+
[opts=header]
27+
|===
28+
| name | type | default | description
29+
| findById | boolean | false | to find a node by id, instead of using existing constraint
30+
| relsInBetween | boolean | false | if enabled consider other terminal nodes, in case of query returning relationships and start or end nodes.
31+
| boltConfig | Map | {} | to provide additional configs to the `apoc.bolt.load` in case of `type:URL` (see `target parameter` table below)
32+
| source | Map | {} | see below
33+
| dest | Map | {} | see below
34+
|===
35+
36+
The `source` and `dest` maps are applied to respectively to the 1st and the 2nd procedure arguments, they can have the following keys:
37+
38+
.source/dest parameters
39+
[opts=header]
40+
|===
41+
| name | type | default | description
42+
| target | Map | {} | see below
43+
| params | Map | {} | to pass additional query params
44+
|===
45+
46+
The `target` param accepts:
47+
48+
.target parameters
49+
[opts=header]
50+
|===
51+
| name | type | default | description
52+
| type | Enum[URL, DATABASE] | `URL` | to search the `target` query using an external bolt url, leveraging the `apoc.bolt.load` procedure (with `URL`), or another database in the same instance (with `DATABASE`)
53+
| value | String | {} | in case of config `type: "URL"`, we can pass the bolt url, in case of `type: "DATABASE"` we can pass the target database name
54+
|===
55+
56+
57+
=== Usage examples
58+
59+
Given this dataset in a `neo4j`:
60+
[source,cypher]
61+
----
62+
CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE;
63+
CREATE (m:Person {name: 'Michael Jordan', age: 54});
64+
CREATE (q:Person {name: 'Tom Burton', age: 23})
65+
CREATE (p:Person {name: 'John William', age: 22})
66+
CREATE (q)-[:KNOWS{since:2016, time:time('125035.556+0100')}]->(p);
67+
----
68+
69+
70+
We can compare 2 set in the same database:
71+
72+
[source,cypher]
73+
----
74+
CALL apoc.diff.graphs("MATCH (start:Person) WHERE start.age < $age RETURN start", "MATCH (start:Person) WHERE start.age > $age RETURN start", {source: {params: {age: 25}}, dest: {params: {age: 25}}})
75+
----
76+
77+
.Results
78+
[opts="header"]
79+
|===
80+
| difference | entityType | id | sourceLabel | destLabel | source | dest
81+
| "Total count" | "Node" | null | null | null | 2 | 1 |
82+
| "Count by Label" | "Node" | null | null | null |{"Person": 2 } | {"Person": 1 }
83+
| "Destination Entity not found" | "Node" | 1 | "Person" | null | {"name": "Tom Burton" } | null
84+
| "Destination Entity not found" | "Node" | 2 | "Person" | null{"name": "John William" } | null
85+
|===
86+
87+
88+
89+
If we create another dataset in a new `secondDb` database:
90+
[source,cypher]
91+
----
92+
CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE;
93+
CREATE (m:Person:Other {name: 'Michael Jordan', age: 54}),
94+
(n:Person {name: 'Tom Burton', age: 47}),
95+
(q:Person:Other {name: 'Jerry Burton', age: 23}),
96+
(p:Person {name: 'Jack William', age: 22}),
97+
(q)-[:KNOWS{since:1999, time:time('125035.556+0100')}]->(p);
98+
----
99+
100+
We can execute, in the `neo4j` database:
101+
[source,cypher]
102+
----
103+
CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end",
104+
"MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end",
105+
{dest: {target: {type: "DATABASE", value: "secondDb"}}})
106+
----
107+
108+
.Results
109+
[opts="header"]
110+
|===
111+
| difference | entityType | id | sourceLabel | destLabel | source | dest
112+
| "Destination Entity not found" | "Node" | 1 | "Person" | null | {"name": "Tom Burton" } | null
113+
| "Destination Entity not found" | "Node" | 2 | "Person" | null |{"name": "John William"}| null
114+
| "Destination Entity not found" | "Relationship" | 0 | "KNOWS" | null | {"start":{"name":"Tom Burton"},"end":{"name":"John William"},"properties":{"time":"12:50:35.556000000+01:00","since":2016}} | null
115+
|===
116+
117+
118+
Vice versa, we can compare 2 dataset starting from the `secondDb` database:
119+
120+
[source,cypher]
121+
----
122+
CALL apoc.diff.graphs("MATCH (node:Person) RETURN node",
123+
"MATCH (node:Person) RETURN node",
124+
{dest: {target: {type: "DATABASE", value: "neo4j"}}})
125+
----
126+
127+
.Results
128+
[opts="header"]
129+
|===
130+
| difference | entityType | id | sourceLabel | destLabel | source | dest
131+
| "Total count" | "Node" | null | null | null | 6 | 3 |
132+
| "Count by Label" | "Node" | null | null | null |{"Person": 4, "Other": 2 } | {"Person": 3 }
133+
| "Different Labels" | "Node" | 0 | "Person" | "Person" | ["Other", "Person"] | ["Person"]
134+
| "Different Properties" | "Node" | 1 | "Person" | "Person" |{"age": 47 } | {"age": 23 }
135+
| "Destination Entity not found" | "Node" | 2 | "Person" | null | {"name": "Jerry Burton" } | null
136+
| "Destination Entity not found" | "Node" | 7 | "Person" | null | {"name": "Jack William" } |null
137+
|===
138+
139+
140+
141+
If we create another dbms instance with the same dataset as `seconddb` we can compare the 2 graph leveraging the `apoc.bolt.load`:
142+
143+
[source,cypher]
144+
----
145+
CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", {dest: {target: {type: "URL", value: "<MY_BOLT_URL>"}}})
146+
----
147+
148+
.Results
149+
[opts="header"]
150+
|===
151+
| difference | entityType | id | sourceLabel | destLabel | source | dest
152+
| "Destination Entity not found" | "Node" | 1 | "Person" | null | {"name": "Tom Burton" } | null
153+
| "Destination Entity not found" | "Node" | 2 | "Person" | null |{"name": "John William"}| null
154+
| "Destination Entity not found" | "Relationship" | 0 | "KNOWS" | null | {"start":{"name":"Tom Burton"},"end":{"name":"John William"},"properties":{"time":"12:50:35.556000000+01:00","since":2016}} | null
155+
|===
156+
157+
158+
If we want to point to a `secondDestDb` database present in a remote `target` instance, we can pass the `boltConfig` parameter to pass additional parameter to `apoc.bolt.load(url, query, params, <boltConfig>)`.
159+
In this case we can pass the `databaseName`, that is:
160+
161+
[source,cypher]
162+
----
163+
CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", {boltConfig: {databaseName: "secondDestDb"}, dest: {target: {type: "URL", value: "bolt://neo4j:apoc@localhost:7687"}}})
164+
----
165+
166+
with the same result as above, if the dataset is the same.

docs/asciidoc/modules/ROOT/pages/database-integration/bolt-neo4j.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,9 @@ apoc.bolt.production.url=bolt://password:test@localhost:7688
6565
Config available are:
6666

6767
* `statistics`: possible values are true/false, the default value is false. This config print the execution statistics;
68-
* `virtual`: possible values are true/false, the default value is false. This config return result in virtual format and not in map format, in apoc.bolt.load.
68+
* `virtual`: possible values are true/false, the default value is false. This config return result in virtual format and not in map format. *N.B.* If `withRelationshipNodeProperties=false` the `VirtualRelationship` contains only the ids about the nodes connected by the edge
69+
* `readOnly`: possible values are true/false, the default value is true. Defines the operation performed over the remote instance.
70+
* `withRelationshipNodeProperties`: possible values are true/false, the default value is false. If `virtual=true` it returns the `VirtualRelationship` with nodes that also contains properties attached to them.
6971
* `databaseName`: the database instance name on the remote Neo4j instance. The default value is 'neo4j'. Put `null` to connect through protocol which not support database name (for neo4j before 4.x).
7072

7173

extended/src/main/java/apoc/bolt/BoltConfig.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
package apoc.bolt;
22

3+
import apoc.util.Util;
34
import org.neo4j.driver.AccessMode;
45
import org.neo4j.driver.Config;
56
import org.neo4j.driver.SessionConfig;
@@ -20,6 +21,7 @@ public class BoltConfig {
2021
private final Map<String, Object> localParams;
2122
private final Map<String, Object> remoteParams;
2223
private final String databaseName;
24+
private final boolean withRelationshipNodeProperties;
2325

2426
public BoltConfig(Map<String, Object> config) {
2527
if (config == null) config = Collections.emptyMap();
@@ -31,6 +33,7 @@ public BoltConfig(Map<String, Object> config) {
3133
this.driverConfig = toDriverConfig((Map<String, Object>) config.getOrDefault("driverConfig", Collections.emptyMap()));
3234
this.localParams = (Map<String, Object>) config.getOrDefault("localParams", Collections.emptyMap());
3335
this.remoteParams = (Map<String, Object>) config.getOrDefault("remoteParams", Collections.emptyMap());
36+
this.withRelationshipNodeProperties = Util.toBoolean(config.get("withRelationshipNodeProperties"));
3437
}
3538

3639
private Config toDriverConfig(Map<String, Object> driverConfMap) {
@@ -117,4 +120,8 @@ public Map<String, Object> getLocalParams() {
117120
public Map<String, Object> getRemoteParams() {
118121
return remoteParams;
119122
}
123+
124+
public boolean isWithRelationshipNodeProperties() {
125+
return withRelationshipNodeProperties;
126+
}
120127
}

0 commit comments

Comments
 (0)