⚡️ Speed up function `find_last_node` by 16,584% #195

codeflash-ai · 2025-12-22T22:41:26Z

📄 16,584% (165.84x) speedup for `find_last_node` in `src/algorithms/graph.py`

⏱️ Runtime : 65.2 milliseconds → 391 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 166x speedup by eliminating redundant nested iterations and replacing them with efficient set-based lookups.

Key Optimizations

1. Early Return for Empty Edges
The optimization adds a fast-path check: if there are no edges, return the first node immediately. This avoids any iteration overhead for disconnected graphs, providing 157-314% speedup on such cases.

2. Set-Based Source Lookup
The critical optimization replaces the nested all(e["source"] != n["id"] for e in edges) check with:

First, build a set of all source node IDs: sources = {e["source"] for e in edges}
Then use fast set membership: n["id"] not in sources

This transforms the algorithm from O(nodes × edges) to O(nodes + edges) complexity.

Why This Works

In the original code, for each node, it iterates through all edges to check if that node is a source. With 1000 nodes and 1000 edges, this performs up to 1 million comparisons.

The optimized version builds the source set once (1000 operations), then does constant-time set lookups for each node (1000 × O(1)), totaling ~2000 operations—a 500x reduction in work.

Performance Impact by Test Type

Empty edges cases: 157-314% faster (early return optimization)
Small graphs (2-10 nodes): 22-94% faster (overhead of set creation still worthwhile)
Large linear chains (1000 nodes): 32,000%+ faster (quadratic → linear complexity)
Large cycle graphs: 32,496% faster (must check all nodes, massive iteration savings)
Star graphs: 87-93% faster (many edges but set lookup scales well)

The optimization particularly excels when there are many nodes and edges, as it eliminates the quadratic blowup that occurs in the nested iteration pattern of the original implementation.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 40 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# 1. BASIC TEST CASES


def test_single_node_no_edges():
    # One node, no edges; node should be returned as last node
    nodes = [{"id": 1, "label": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.17μs -> 333ns (250% faster)


def test_two_nodes_one_edge():
    # Two nodes, one edge from 1 to 2; 2 should be last node
    nodes = [{"id": 1, "label": "A"}, {"id": 2, "label": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.96μs -> 1.17μs (67.8% faster)


def test_three_nodes_linear_chain():
    # Three nodes in a chain: 1->2->3; 3 should be last node
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.33μs -> 1.29μs (80.7% faster)


def test_multiple_terminal_nodes_returns_first():
    # Two nodes with no outgoing edges; should return the first found
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 333ns (263% faster)


def test_cycle_graph():
    # Nodes 1->2->3->1 (cycle); all have outgoing edges, so should return None
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.29μs -> 1.33μs (71.9% faster)


# 2. EDGE TEST CASES


def test_empty_nodes_list():
    # No nodes; should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 750ns -> 292ns (157% faster)


def test_edges_with_no_matching_nodes():
    # Edges reference node ids not in nodes; all nodes have no outgoing edges
    nodes = [{"id": 10}, {"id": 20}]
    edges = [{"source": 99, "target": 10}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 1.12μs (29.7% faster)


def test_node_with_self_loop():
    # Node with edge to itself; should not be a terminal node
    nodes = [{"id": 5}]
    edges = [{"source": 5, "target": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.38μs -> 1.12μs (22.2% faster)


def test_disconnected_nodes():
    # Some nodes are disconnected (no edges at all)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.17μs (57.2% faster)


def test_multiple_edges_from_one_node():
    # One node with multiple outgoing edges, others are terminal
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 1.25μs (53.4% faster)


def test_node_with_incoming_but_no_outgoing():
    # Node with only incoming edges should be terminal
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.12μs (62.9% faster)


def test_nodes_with_duplicate_ids():
    # Duplicate ids in nodes; should return first one with no outgoing edges
    nodes = [{"id": 1}, {"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 333ns (263% faster)


def test_edge_with_extra_keys():
    # Edge contains extra keys; function should ignore them
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.17μs (57.2% faster)


def test_node_with_non_integer_id():
    # Node id is a string
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 1.17μs (60.7% faster)


def test_edge_with_none_source():
    # Edge with source None; should not match any node id
    nodes = [{"id": 1}]
    edges = [{"source": None, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 1.12μs (40.7% faster)


# 3. LARGE SCALE TEST CASES


def test_large_linear_chain():
    # Large chain of 1000 nodes; last node should be terminal
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.3ms -> 57.0μs (32077% faster)


def test_large_star_graph():
    # Node 0 points to all others; all others are terminal nodes
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": 0, "target": i} for i in range(1, N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 39.3μs -> 20.4μs (92.8% faster)


def test_large_disconnected_graph():
    # 1000 nodes, no edges; first node is terminal
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.29μs -> 333ns (288% faster)


def test_large_cycle_graph():
    # 1000 nodes in a cycle; all have outgoing edges, so None
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.4ms -> 56.4μs (32496% faster)


def test_large_graph_with_multiple_terminals():
    # 1000 nodes, edges from 0 to 1..998; nodes 999 is isolated, 1 is first terminal
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": 0, "target": i} for i in range(1, N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 38.8μs -> 20.4μs (90.4% faster)


# Additional edge case for performance and correctness


def test_large_graph_with_sparse_edges():
    # 1000 nodes, 10 random edges; most nodes are terminal, first should be returned
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(0, 20, 2)]  # 10 edges
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.29μs -> 1.79μs (28.0% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# -----------------------
# BASIC TEST CASES
# -----------------------


def test_single_node_no_edges():
    # One node, no edges: should return the node itself
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 292ns (314% faster)


def test_two_nodes_one_edge():
    # Two nodes, one edge from node 1 to node 2: node 2 is last
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.17μs (53.6% faster)


def test_multiple_nodes_linear_chain():
    # Three nodes in a chain: 1 -> 2 -> 3, last node is 3
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.17μs -> 1.25μs (73.3% faster)


def test_multiple_nodes_multiple_endpoints():
    # 1 -> 2, 1 -> 3; both 2 and 3 are endpoints, should return first found (2)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.21μs (48.3% faster)


# -----------------------
# EDGE TEST CASES
# -----------------------


def test_empty_nodes_and_edges():
    # No nodes, no edges: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 791ns -> 291ns (172% faster)


def test_nodes_with_no_matching_edges():
    # Nodes exist, but edges refer to non-existent sources/targets
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 3, "target": 4}]
    # Both nodes have no outgoing edges, so function returns first node (1)
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.38μs -> 1.12μs (22.2% faster)


def test_cycle_graph():
    # A cycle: 1 -> 2 -> 3 -> 1, so no node is a 'last' node (all have outgoing edges)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.21μs -> 1.33μs (65.6% faster)


def test_disconnected_nodes():
    # Some nodes are not connected at all
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
    edges = [{"source": 1, "target": 2}]
    # Nodes 3 and 4 have no outgoing edges; function returns first found (3)
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.67μs -> 1.12μs (48.2% faster)


def test_multiple_edges_per_node():
    # Node 1 has multiple outgoing edges; node 4 is the only endpoint
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 3},
        {"source": 2, "target": 4},
        {"source": 3, "target": 4},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.67μs -> 1.38μs (94.0% faster)


def test_node_with_self_loop():
    # Node 1 has a self-loop, node 2 is endpoint
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.71μs -> 1.17μs (46.5% faster)


def test_duplicate_node_ids():
    # Duplicate node IDs, only first should be returned
    nodes = [{"id": 1}, {"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.96μs -> 1.21μs (62.2% faster)


def test_non_integer_node_ids():
    # Node IDs are strings
    nodes = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
    edges = [{"source": "a", "target": "b"}, {"source": "b", "target": "c"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.38μs -> 1.29μs (83.8% faster)


def test_edges_with_extra_keys():
    # Edges have extra keys, should be ignored
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 10}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.75μs -> 1.17μs (50.0% faster)


def test_nodes_with_extra_keys():
    # Nodes have extra keys, should be ignored
    nodes = [{"id": 1, "label": "A"}, {"id": 2, "label": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.75μs -> 1.17μs (50.1% faster)


# -----------------------
# LARGE SCALE TEST CASES
# -----------------------


def test_large_linear_chain():
    # 1000 nodes in a chain: 0 -> 1 -> ... -> 999
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.3ms -> 56.5μs (32349% faster)


def test_large_star_graph():
    # Node 0 points to all others, all others are endpoints
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": 0, "target": i} for i in range(1, N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 38.3μs -> 20.5μs (87.0% faster)


def test_large_disconnected_graph():
    # 500 nodes in a chain, 500 isolated nodes
    N = 500
    nodes = [{"id": i} for i in range(N * 2)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.57ms -> 28.0μs (16182% faster)


def test_large_complete_graph():
    # Every node points to every other node (except self), so no last node
    N = 30  # N^2 edges, keep small for performance
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": j} for i in range(N) for j in range(N) if i != j]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 457μs -> 20.9μs (2089% faster)


def test_large_graph_with_multiple_endpoints():
    # Two chains of 500 nodes each, last node of each chain is endpoint
    N = 500
    nodes = [{"id": f"A{i}"} for i in range(N)] + [{"id": f"B{i}"} for i in range(N)]
    edges = [{"source": f"A{i}", "target": f"A{i+1}"} for i in range(N - 1)] + [
        {"source": f"B{i}", "target": f"B{i+1}"} for i in range(N - 1)
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.97ms -> 79.2μs (6177% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mjhqpyu3 and push.

The optimized code achieves a **166x speedup** by eliminating redundant nested iterations and replacing them with efficient set-based lookups. ## Key Optimizations **1. Early Return for Empty Edges** The optimization adds a fast-path check: if there are no edges, return the first node immediately. This avoids any iteration overhead for disconnected graphs, providing 157-314% speedup on such cases. **2. Set-Based Source Lookup** The critical optimization replaces the nested `all(e["source"] != n["id"] for e in edges)` check with: - First, build a set of all source node IDs: `sources = {e["source"] for e in edges}` - Then use fast set membership: `n["id"] not in sources` This transforms the algorithm from **O(nodes × edges)** to **O(nodes + edges)** complexity. ## Why This Works In the original code, for each node, it iterates through *all* edges to check if that node is a source. With 1000 nodes and 1000 edges, this performs up to 1 million comparisons. The optimized version builds the source set once (1000 operations), then does constant-time set lookups for each node (1000 × O(1)), totaling ~2000 operations—a 500x reduction in work. ## Performance Impact by Test Type - **Empty edges cases**: 157-314% faster (early return optimization) - **Small graphs (2-10 nodes)**: 22-94% faster (overhead of set creation still worthwhile) - **Large linear chains (1000 nodes)**: **32,000%+ faster** (quadratic → linear complexity) - **Large cycle graphs**: **32,496% faster** (must check all nodes, massive iteration savings) - **Star graphs**: 87-93% faster (many edges but set lookup scales well) The optimization particularly excels when there are many nodes and edges, as it eliminates the quadratic blowup that occurs in the nested iteration pattern of the original implementation.

codeflash-ai bot requested a review from KRRT7 December 22, 2025 22:41

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 22, 2025

KRRT7 closed this Dec 23, 2025

codeflash-ai bot deleted the codeflash/optimize-find_last_node-mjhqpyu3 branch December 23, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `find_last_node` by 16,584% #195

⚡️ Speed up function `find_last_node` by 16,584% #195

Uh oh!

codeflash-ai bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up function find_last_node by 16,584% #195

⚡️ Speed up function find_last_node by 16,584% #195

Uh oh!

Conversation

codeflash-ai bot commented Dec 22, 2025

📄 16,584% (165.84x) speedup for find_last_node in src/algorithms/graph.py

📝 Explanation and details

Key Optimizations

Why This Works

Performance Impact by Test Type

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up function `find_last_node` by 16,584% #195

⚡️ Speed up function `find_last_node` by 16,584% #195

📄 16,584% (165.84x) speedup for `find_last_node` in `src/algorithms/graph.py`