You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated NeptuneSchema.js to account for multi-label nodes to prevent duplicated nodes and edges (aws#125)
There was a bug where multi-label nodes would duplicate edges as the function findFromAndToLabels(edgeStructure)
(which looks at a specific edge type in the graph database to see what kinds of nodes it connects, creating a list of all the different from-to label pairs for that relationship) used nested for loops to go through all combinations of fromLabel and toLabel arrays. When nodes had multiple labels, this created duplicate edge directions since each label combination would generate a separate entry in edgeStructure.directions, even if the actual relationship between the node types was the same.
Also, the function getNodeNames() (which gets all the different node types in the graph database by querying for all nodes and their labels, then adds each label to the schema structure) generated duplicated nodes in the generated schema as the original code was processing node labels without checking for duplicates, causing the same node label to be added multiple times to schema.nodeStructures.
The fix creates a new empty set every time the function is called to store every processed edge or node in order to check that it has not been previously processed.
@@ -181,11 +188,20 @@ async function findFromAndToLabels(edgeStructure) {
181
188
constquery=`MATCH (from)-[r:${sanitize(edgeStructure.label)}]->(to) WITH from, to LIMIT $sample RETURN DISTINCT labels(from) as fromLabel, labels(to) as toLabel`;
182
189
loggerDebug(`Retrieving incoming and outgoing labels for edge ${edgeStructure.label} with limit ${SAMPLE}`,{toConsole: true});
0 commit comments