|
| 1 | +== `apoc.diff.graphs` procedure |
| 2 | + |
| 3 | +The procedure accepts 2 string argument, the `source` and the `dest`, representing 2 queries to compare, |
| 4 | +and an optional `config` map as a 3rd parameter. |
| 5 | + |
| 6 | +The procedure compares the `source` and `dest` two graphs and returns the differences in terms of: |
| 7 | + |
| 8 | +* same node count |
| 9 | +* same count per label |
| 10 | +* same relationship counter |
| 11 | +* same count per rel-type |
| 12 | +
|
| 13 | +For each node in the `source` graph with a certain label, find the same node (in the `dest` graph) by keys or internal id in the other graph and if found: |
| 14 | +* compare all labels |
| 15 | +* compare all properties |
| 16 | + |
| 17 | +Please note that the node finding leverage the existing constraint to find an equivalent node. To find a node using the internal id you can use `findById:true` config (see below). |
| 18 | + |
| 19 | +For each relationship in the `source` graph we'll get the two nodes of the relationship and look into the `dest` |
| 20 | +graph if there is a relationship with the same properties and the same start/end node. |
| 21 | + |
| 22 | + |
| 23 | +The procedure support the following `config` parameters: |
| 24 | + |
| 25 | +.config parameters |
| 26 | +[opts=header] |
| 27 | +|=== |
| 28 | +| name | type | default | description |
| 29 | +| findById | boolean | false | to find a node by id, instead of using existing constraint |
| 30 | +| relsInBetween | boolean | false | if enabled consider other terminal nodes, in case of query returning relationships and start or end nodes. |
| 31 | +| boltConfig | Map | {} | to provide additional configs to the `apoc.bolt.load` in case of `type:URL` (see `target parameter` table below) |
| 32 | +| source | Map | {} | see below |
| 33 | +| dest | Map | {} | see below |
| 34 | +|=== |
| 35 | + |
| 36 | +The `source` and `dest` maps are applied to respectively to the 1st and the 2nd procedure arguments, they can have the following keys: |
| 37 | + |
| 38 | +.source/dest parameters |
| 39 | +[opts=header] |
| 40 | +|=== |
| 41 | +| name | type | default | description |
| 42 | +| target | Map | {} | see below |
| 43 | +| params | Map | {} | to pass additional query params |
| 44 | +|=== |
| 45 | + |
| 46 | +The `target` param accepts: |
| 47 | + |
| 48 | +.target parameters |
| 49 | +[opts=header] |
| 50 | +|=== |
| 51 | +| name | type | default | description |
| 52 | +| type | Enum[URL, DATABASE] | `URL` | to search the `target` query using an external bolt url, leveraging the `apoc.bolt.load` procedure (with `URL`), or another database in the same instance (with `DATABASE`) |
| 53 | +| value | String | {} | in case of config `type: "URL"`, we can pass the bolt url, in case of `type: "DATABASE"` we can pass the target database name |
| 54 | +|=== |
| 55 | + |
| 56 | + |
| 57 | +=== Usage examples |
| 58 | + |
| 59 | +Given this dataset in a `neo4j`: |
| 60 | +[source,cypher] |
| 61 | +---- |
| 62 | +CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE; |
| 63 | +CREATE (m:Person {name: 'Michael Jordan', age: 54}); |
| 64 | +CREATE (q:Person {name: 'Tom Burton', age: 23}) |
| 65 | +CREATE (p:Person {name: 'John William', age: 22}) |
| 66 | +CREATE (q)-[:KNOWS{since:2016, time:time('125035.556+0100')}]->(p); |
| 67 | +---- |
| 68 | + |
| 69 | + |
| 70 | +We can compare 2 set in the same database: |
| 71 | + |
| 72 | +[source,cypher] |
| 73 | +---- |
| 74 | +CALL apoc.diff.graphs("MATCH (start:Person) WHERE start.age < $age RETURN start", "MATCH (start:Person) WHERE start.age > $age RETURN start", {source: {params: {age: 25}}, dest: {params: {age: 25}}}) |
| 75 | +---- |
| 76 | + |
| 77 | +.Results |
| 78 | +[opts="header"] |
| 79 | +|=== |
| 80 | +| difference | entityType | id | sourceLabel | destLabel | source | dest |
| 81 | +| "Total count" | "Node" | null | null | null | 2 | 1 | |
| 82 | +| "Count by Label" | "Node" | null | null | null |{"Person": 2 } | {"Person": 1 } |
| 83 | +| "Destination Entity not found" | "Node" | 1 | "Person" | null | {"name": "Tom Burton" } | null |
| 84 | +| "Destination Entity not found" | "Node" | 2 | "Person" | null{"name": "John William" } | null |
| 85 | +|=== |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +If we create another dataset in a new `secondDb` database: |
| 90 | +[source,cypher] |
| 91 | +---- |
| 92 | +CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE; |
| 93 | +CREATE (m:Person:Other {name: 'Michael Jordan', age: 54}), |
| 94 | + (n:Person {name: 'Tom Burton', age: 47}), |
| 95 | + (q:Person:Other {name: 'Jerry Burton', age: 23}), |
| 96 | + (p:Person {name: 'Jack William', age: 22}), |
| 97 | + (q)-[:KNOWS{since:1999, time:time('125035.556+0100')}]->(p); |
| 98 | +---- |
| 99 | + |
| 100 | +We can execute, in the `neo4j` database: |
| 101 | +[source,cypher] |
| 102 | +---- |
| 103 | +CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", |
| 104 | + "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", |
| 105 | + {dest: {target: {type: "DATABASE", value: "secondDb"}}}) |
| 106 | +---- |
| 107 | + |
| 108 | +.Results |
| 109 | +[opts="header"] |
| 110 | +|=== |
| 111 | +| difference | entityType | id | sourceLabel | destLabel | source | dest |
| 112 | +| "Destination Entity not found" | "Node" | 1 | "Person" | null | {"name": "Tom Burton" } | null |
| 113 | +| "Destination Entity not found" | "Node" | 2 | "Person" | null |{"name": "John William"}| null |
| 114 | +| "Destination Entity not found" | "Relationship" | 0 | "KNOWS" | null | {"start":{"name":"Tom Burton"},"end":{"name":"John William"},"properties":{"time":"12:50:35.556000000+01:00","since":2016}} | null |
| 115 | +|=== |
| 116 | + |
| 117 | + |
| 118 | +Vice versa, we can compare 2 dataset starting from the `secondDb` database: |
| 119 | + |
| 120 | +[source,cypher] |
| 121 | +---- |
| 122 | +CALL apoc.diff.graphs("MATCH (node:Person) RETURN node", |
| 123 | + "MATCH (node:Person) RETURN node", |
| 124 | + {dest: {target: {type: "DATABASE", value: "neo4j"}}}) |
| 125 | +---- |
| 126 | + |
| 127 | +.Results |
| 128 | +[opts="header"] |
| 129 | +|=== |
| 130 | +| difference | entityType | id | sourceLabel | destLabel | source | dest |
| 131 | +| "Total count" | "Node" | null | null | null | 6 | 3 | |
| 132 | +| "Count by Label" | "Node" | null | null | null |{"Person": 4, "Other": 2 } | {"Person": 3 } |
| 133 | +| "Different Labels" | "Node" | 0 | "Person" | "Person" | ["Other", "Person"] | ["Person"] |
| 134 | +| "Different Properties" | "Node" | 1 | "Person" | "Person" |{"age": 47 } | {"age": 23 } |
| 135 | +| "Destination Entity not found" | "Node" | 2 | "Person" | null | {"name": "Jerry Burton" } | null |
| 136 | +| "Destination Entity not found" | "Node" | 7 | "Person" | null | {"name": "Jack William" } |null |
| 137 | +|=== |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | +If we create another dbms instance with the same dataset as `seconddb` we can compare the 2 graph leveraging the `apoc.bolt.load`: |
| 142 | + |
| 143 | +[source,cypher] |
| 144 | +---- |
| 145 | +CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", {dest: {target: {type: "URL", value: "<MY_BOLT_URL>"}}}) |
| 146 | +---- |
| 147 | + |
| 148 | +.Results |
| 149 | +[opts="header"] |
| 150 | +|=== |
| 151 | +| difference | entityType | id | sourceLabel | destLabel | source | dest |
| 152 | +| "Destination Entity not found" | "Node" | 1 | "Person" | null | {"name": "Tom Burton" } | null |
| 153 | +| "Destination Entity not found" | "Node" | 2 | "Person" | null |{"name": "John William"}| null |
| 154 | +| "Destination Entity not found" | "Relationship" | 0 | "KNOWS" | null | {"start":{"name":"Tom Burton"},"end":{"name":"John William"},"properties":{"time":"12:50:35.556000000+01:00","since":2016}} | null |
| 155 | +|=== |
| 156 | + |
| 157 | + |
| 158 | +If we want to point to a `secondDestDb` database present in a remote `target` instance, we can pass the `boltConfig` parameter to pass additional parameter to `apoc.bolt.load(url, query, params, <boltConfig>)`. |
| 159 | +In this case we can pass the `databaseName`, that is: |
| 160 | + |
| 161 | +[source,cypher] |
| 162 | +---- |
| 163 | +CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", {boltConfig: {databaseName: "secondDestDb"}, dest: {target: {type: "URL", value: "bolt://neo4j:apoc@localhost:7687"}}}) |
| 164 | +---- |
| 165 | + |
| 166 | +with the same result as above, if the dataset is the same. |
0 commit comments