Optimize find_last_node

codeflash-ai[bot] · web-flow · commit 8fa4daeab393 · 2025-12-18T01:34:27.000Z
The optimization transforms an O(N*M) algorithm into an O(N+M) algorithm by replacing repeated linear searches with a single set-based lookup.

**Key Changes:**
1. **Pre-compute edge sources**: Creates a set `{e["source"] for e in edges}` containing all edge source IDs (O(M) time)
2. **Replace nested loop with set lookup**: Changes from checking `all(e["source"] != n["id"] for e in edges)` for each node to a simple `n["id"] not in edge_sources` check (O(1) per node vs O(M) per node)
3. **Early return optimization**: Uses explicit loop with early return instead of generator expression with `next()`

**Why It's Faster:**
The original code had quadratic complexity - for each of the N nodes, it scanned all M edges to check if the node appears as a source. This results in N*M operations. The optimized version builds the edge sources set once (M operations) then performs N constant-time lookups, totaling N+M operations.

**Performance Impact:**
The 170x speedup (from 81.8ms to 477µs) demonstrates the dramatic improvement, especially evident in the large-scale test cases. The optimization excels when:
- **Large edge counts**: More edges make the set pre-computation cost worthwhile
- **Many nodes to check**: Linear scanning becomes expensive with more nodes
- **Dense graphs**: When most nodes are sources, early termination is less likely in the original approach

This optimization is particularly valuable for graph analysis workloads where finding sink nodes (nodes with no outgoing edges) is a common operation in larger datasets.
diff --git a/src/algorithms/graph.py b/src/algorithms/graph.py
@@ -47,7 +47,12 @@ def find_shortest_path(self, start: str, end: str) -> list[str]:
 
 def find_last_node(nodes, edges):
     """This function receives a flow and returns the last node."""
-    return next((n for n in nodes if all(e["source"] != n["id"] for e in edges)), None)
+    # Collect all edge sources into a set for O(1) lookups
+    edge_sources = {e["source"] for e in edges}
+    for n in nodes:
+        if n["id"] not in edge_sources:
+            return n
+    return None
 
 
 def find_leaf_nodes(nodes: list[dict], edges: list[dict]) -> list[dict]: