From 91fdda3f7255159ccbd99e74650ac2a5b14bce37 Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Mon, 22 Dec 2025 22:37:33 +0000
Subject: [PATCH] Optimize find_last_node
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimized code achieves a **254x speedup** by eliminating a nested loop complexity issue. Here's why:

**The Core Problem:**
The original implementation uses a nested comprehension that checks `all(e["source"] != n["id"] for e in edges)` for each node. This creates **O(N × M)** comparisons where N is the number of nodes and M is the number of edges. For every node candidate, the code must scan through *all* edges repeatedly.

**The Optimization:**
The optimized version pre-computes a set of source IDs: `source_ids = {e["source"] for e in edges}`. This transforms the problem into:
1. **One-time O(M) operation** to build the set
2. **O(N) lookups** with O(1) average-case set membership checks

This reduces the overall complexity from **O(N × M)** to **O(N + M)**.

**Why This Matters:**
- **Set membership (`in`) vs repeated iteration:** Python sets use hash tables, making lookups nearly instantaneous compared to iterating through a list for each check.
- **Single pass through edges:** Building the set once is far cheaper than the original code's repeated iteration through all edges for every node.

**Performance Impact by Test Case:**
- **Small graphs** (2-10 nodes/edges): 30-93% faster - modest gains as overhead is minimal
- **Medium graphs** (100 nodes/edges): 3000%+ faster - the O(N×M) penalty becomes significant
- **Large graphs** (1000 nodes/edges): 32,000%+ faster - the nested loop becomes catastrophic. For example, `test_large_linear_chain` drops from 18.6ms to 57.2μs because it eliminates ~1 million comparisons.

**Special Cases:**
- Empty graphs see slight slowdown (13-29%) due to set creation overhead when there's nothing to optimize
- Graphs with cycles or multiple sinks benefit equally since the improvement is in the lookup mechanism, not the logic

The optimization is universally beneficial for any non-trivial graph workload and essential for production code processing moderate to large graphs.
---
 src/algorithms/graph.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/algorithms/graph.py b/src/algorithms/graph.py
index 777ea3b..156485a 100644
--- a/src/algorithms/graph.py
+++ b/src/algorithms/graph.py
@@ -47,7 +47,8 @@ def find_shortest_path(self, start: str, end: str) -> list[str]:
 
 def find_last_node(nodes, edges):
     """This function receives a flow and returns the last node."""
-    return next((n for n in nodes if all(e["source"] != n["id"] for e in edges)), None)
+    source_ids = {e["source"] for e in edges}
+    return next((n for n in nodes if n["id"] not in source_ids), None)
 
 
 def find_leaf_nodes(nodes: list[dict], edges: list[dict]) -> list[dict]: