From 89554a87d16cd179a335046ac99ef8cc057dc8ac Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Wed, 30 Jul 2025 02:37:44 +0000
Subject: [PATCH] =?UTF-8?q?=E2=9A=A1=EF=B8=8F=20Speed=20up=20function=20`f?=
 =?UTF-8?q?ind=5Flast=5Fnode`=20by=2013,550%?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimization transforms an O(n*m) algorithm into an O(n+m) algorithm by eliminating redundant work through preprocessing.

**Key Optimization: Set-based Preprocessing**
The original code uses a nested loop structure where for each node, it checks all edges to see if any edge has that node as a source. This creates an O(n*m) time complexity where n is the number of nodes and m is the number of edges.

The optimized version preprocesses all edge sources into a set (`sources = {e["source"] for e in edges}`), then performs a simple O(1) set membership check (`n["id"] not in sources`) for each node. This reduces the overall complexity to O(n+m).

**Specific Changes:**
1. **Preprocessing step**: Creates a set of all source node IDs from edges in a single pass
2. **Lookup optimization**: Replaces the `all(e["source"] != n["id"] for e in edges)` check with a fast set membership test
3. **Eliminates nested iteration**: The original code had to iterate through all edges for every node candidate

**Why This Creates Massive Speedup:**
- Set membership lookup is O(1) average case vs O(m) linear search through edges
- The preprocessing cost O(m) is paid only once, not n times
- As shown in the line profiler, the original code spent 100% of time in the nested loop, while the optimized version splits time between preprocessing (57.6%) and the main loop (42.4%)

**Test Case Performance Patterns:**
- **Linear chains and large graphs show dramatic improvements** (19,000%+ speedup): These benefit most because they have high edge counts relative to the final result
- **Small graphs with few edges show modest improvements** (25-130% speedup): The preprocessing overhead is more noticeable, but set lookup is still faster
- **Empty cases show slight regression** (10% slower): The preprocessing step adds overhead when there are no edges to process

The optimization is particularly effective for graph analysis scenarios where edge density is high relative to the number of sink nodes (nodes with no outgoing edges).
---
 src/dsa/nodes.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/dsa/nodes.py b/src/dsa/nodes.py
index 521d24e..800e4d4 100644
--- a/src/dsa/nodes.py
+++ b/src/dsa/nodes.py
@@ -5,7 +5,8 @@
 # derived from https://github.com/langflow-ai/langflow/pull/5261
 def find_last_node(nodes, edges):
     """This function receives a flow and returns the last node."""
-    return next((n for n in nodes if all(e["source"] != n["id"] for e in edges)), None)
+    sources = {e["source"] for e in edges}
+    return next((n for n in nodes if n["id"] not in sources), None)
 
 
 # Function to find all leaf nodes (nodes with no outgoing edges)