Commit 8fa4dae
authored
Optimize find_last_node
The optimization transforms an O(N*M) algorithm into an O(N+M) algorithm by replacing repeated linear searches with a single set-based lookup.
**Key Changes:**
1. **Pre-compute edge sources**: Creates a set `{e["source"] for e in edges}` containing all edge source IDs (O(M) time)
2. **Replace nested loop with set lookup**: Changes from checking `all(e["source"] != n["id"] for e in edges)` for each node to a simple `n["id"] not in edge_sources` check (O(1) per node vs O(M) per node)
3. **Early return optimization**: Uses explicit loop with early return instead of generator expression with `next()`
**Why It's Faster:**
The original code had quadratic complexity - for each of the N nodes, it scanned all M edges to check if the node appears as a source. This results in N*M operations. The optimized version builds the edge sources set once (M operations) then performs N constant-time lookups, totaling N+M operations.
**Performance Impact:**
The 170x speedup (from 81.8ms to 477µs) demonstrates the dramatic improvement, especially evident in the large-scale test cases. The optimization excels when:
- **Large edge counts**: More edges make the set pre-computation cost worthwhile
- **Many nodes to check**: Linear scanning becomes expensive with more nodes
- **Dense graphs**: When most nodes are sources, early termination is less likely in the original approach
This optimization is particularly valuable for graph analysis workloads where finding sink nodes (nodes with no outgoing edges) is a common operation in larger datasets.1 parent e776522 commit 8fa4dae
1 file changed
+6
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
51 | 56 | | |
52 | 57 | | |
53 | 58 | | |
| |||
0 commit comments