Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Oct 13, 2025

📄 62% (0.62x) speedup for Graph.topologicalSort in code_to_optimize/topological_sort.py

⏱️ Runtime : 2.31 milliseconds 1.43 milliseconds (best of 17 runs)

📝 Explanation and details

The optimization achieves a 61% speedup by eliminating an expensive O(n) list operation that was being called repeatedly.

Key optimization:

  • Replaced stack.insert(0, v) with stack.append(v) + final stack.reverse(): The original code used insert(0, v) to prepend elements to the stack, which requires shifting all existing elements and costs O(n) time per insertion. With 7,163 calls to this operation (as shown in the profiler), this became a significant bottleneck consuming 16.6% of the function's time.

  • Changed boolean comparisons: Replaced visited[i] == False with not visited[i] for slightly cleaner, more Pythonic code.

Why this works:
The topological sort algorithm needs to output vertices in reverse post-order. Instead of building this order directly with expensive prepends, the optimized version builds the reverse order efficiently with O(1) appends, then reverses the entire list once at the end. Since list.reverse() is O(n) but only called once versus O(n) operations called thousands of times, this dramatically reduces the time complexity from O(n²) to O(n) for the stack operations.

Performance characteristics:
The optimization particularly excels with larger graphs (as seen in the large-scale test cases with 1000+ nodes) where the quadratic behavior of repeated insertions becomes most apparent. For smaller graphs, the improvement is still measurable but less dramatic.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 83 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import uuid
from collections import defaultdict

# imports
import pytest  # used for our unit tests
from code_to_optimize.topological_sort import Graph

# unit tests

# Helper function to check if a list is a valid topological sort of the given graph
def is_valid_topological_sort(graph_edges, sort):
    # Build adjacency list and indegree count
    from collections import defaultdict, deque
    adj = defaultdict(list)
    indegree = defaultdict(int)
    nodes = set()
    for u, v in graph_edges:
        adj[u].append(v)
        indegree[v] += 1
        nodes.add(u)
        nodes.add(v)
    nodes.update(sort)
    position = {node: idx for idx, node in enumerate(sort)}
    # For each edge u -> v, u must appear before v in sort
    for u, v in graph_edges:
        if position[u] >= position[v]:
            return False
    # All nodes must be present in sort
    if set(sort) != nodes:
        return False
    return True

# ----------- BASIC TEST CASES -----------

def test_empty_graph():
    # Test with zero vertices (empty graph)
    g = Graph(0)
    result, sort_id = g.topologicalSort()

def test_single_node_graph():
    # Test with one node and no edges
    g = Graph(1)
    result, sort_id = g.topologicalSort()

def test_two_nodes_no_edges():
    # Test with two nodes and no edges
    g = Graph(2)
    result, sort_id = g.topologicalSort()

def test_linear_chain():
    # 0 -> 1 -> 2 -> 3
    g = Graph(4)
    g.graph[0].append(1)
    g.graph[1].append(2)
    g.graph[2].append(3)
    result, sort_id = g.topologicalSort()

def test_branching_graph():
    # 0 -> 1, 0 -> 2, 1 -> 3, 2 -> 3
    g = Graph(4)
    g.graph[0].extend([1,2])
    g.graph[1].append(3)
    g.graph[2].append(3)
    result, sort_id = g.topologicalSort()

def test_disconnected_components():
    # 0 -> 1, 2 -> 3
    g = Graph(4)
    g.graph[0].append(1)
    g.graph[2].append(3)
    result, sort_id = g.topologicalSort()

# ----------- EDGE TEST CASES -----------

def test_cycle_detection_behavior():
    # 0 -> 1 -> 2 -> 0 (cycle)
    g = Graph(3)
    g.graph[0].append(1)
    g.graph[1].append(2)
    g.graph[2].append(0)
    # The algorithm does not detect cycles, so it will produce a result
    # But the result will not be a valid topological sort
    result, sort_id = g.topologicalSort()

def test_graph_with_isolated_nodes():
    # 0 -> 1, 2 (isolated), 3 (isolated)
    g = Graph(4)
    g.graph[0].append(1)
    result, sort_id = g.topologicalSort()

def test_graph_with_self_loop():
    # 0 -> 0 (self loop)
    g = Graph(1)
    g.graph[0].append(0)
    result, sort_id = g.topologicalSort()

def test_graph_with_multiple_edges():
    # 0 -> 1 (twice)
    g = Graph(2)
    g.graph[0].append(1)
    g.graph[0].append(1)
    result, sort_id = g.topologicalSort()

def test_graph_with_non_sequential_nodes():
    # 0 -> 2, 1 -> 3
    g = Graph(4)
    g.graph[0].append(2)
    g.graph[1].append(3)
    result, sort_id = g.topologicalSort()

# ----------- LARGE SCALE TEST CASES -----------

def test_large_linear_chain():
    # Chain of 1000 nodes: 0->1->2->...->999
    N = 1000
    g = Graph(N)
    for i in range(N-1):
        g.graph[i].append(i+1)
    result, sort_id = g.topologicalSort()

def test_large_branching_tree():
    # Binary tree, 10 levels (2^10-1 = 1023 nodes)
    N = 1023
    g = Graph(N)
    for i in range(N):
        left = 2*i + 1
        right = 2*i + 2
        if left < N:
            g.graph[i].append(left)
        if right < N:
            g.graph[i].append(right)
    result, sort_id = g.topologicalSort()
    # Each parent before its children
    for i in range(N):
        left = 2*i + 1
        right = 2*i + 2
        if left < N:
            pass
        if right < N:
            pass

def test_large_disconnected_graph():
    # 10 chains of 100 nodes each, total 1000 nodes
    N = 1000
    g = Graph(N)
    for chain in range(10):
        base = chain*100
        for i in range(99):
            g.graph[base + i].append(base + i + 1)
    result, sort_id = g.topologicalSort()
    # Each chain's nodes must be in order
    for chain in range(10):
        base = chain*100
        for i in range(99):
            pass

def test_large_sparse_graph():
    # 1000 nodes, only 10 random edges
    N = 1000
    g = Graph(N)
    edges = [(i, i+1) for i in range(10)]
    for u, v in edges:
        g.graph[u].append(v)
    result, sort_id = g.topologicalSort()
    # For each edge, u before v
    for u, v in edges:
        pass

def test_large_dense_graph():
    # 50 nodes, every node points to all nodes with higher index
    N = 50
    g = Graph(N)
    for i in range(N):
        for j in range(i+1, N):
            g.graph[i].append(j)
    result, sort_id = g.topologicalSort()

# ----------- DETERMINISM AND OUTPUT PROPERTIES -----------

def test_sort_id_uniqueness():
    # Test that sort_id is unique for each call
    g = Graph(2)
    ids = set()
    for _ in range(10):
        _, sort_id = g.topologicalSort()
        ids.add(sort_id)

def test_sort_id_type_and_format():
    # Test that sort_id is a valid UUID string
    g = Graph(1)
    _, sort_id = g.topologicalSort()
    import uuid
    try:
        uuid_obj = uuid.UUID(sort_id)
    except ValueError:
        pytest.fail("Sort ID is not a valid UUID string")

# ----------- MUTATION TESTING PROTECTION -----------

def test_mutation_protection():
    # Changing the order of stack insertion should fail this test
    # 0 -> 1 -> 2
    g = Graph(3)
    g.graph[0].append(1)
    g.graph[1].append(2)
    result, sort_id = g.topologicalSort()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import uuid
from collections import defaultdict

# imports
import pytest  # used for our unit tests
from code_to_optimize.topological_sort import Graph

# unit tests

# Helper function to check if a list is a valid topological sort for a given graph
def is_valid_topological_sort(vertices, edges, order):
    """Returns True if 'order' is a valid topological sort for the given graph."""
    pos = {v: i for i, v in enumerate(order)}
    for u, vs in edges.items():
        for v in vs:
            if pos[u] > pos[v]:
                return False
    return set(order) == set(range(vertices))

# ------------------ Basic Test Cases ------------------

def test_empty_graph():
    # Test with 0 vertices (empty graph)
    g = Graph(0)
    result, sorting_id = g.topologicalSort()

def test_single_node_graph():
    # Test with 1 vertex and no edges
    g = Graph(1)
    result, sorting_id = g.topologicalSort()

def test_two_nodes_no_edges():
    # Test with 2 vertices and no edges
    g = Graph(2)
    result, sorting_id = g.topologicalSort()

def test_simple_chain():
    # Test with a simple chain: 0 -> 1 -> 2
    g = Graph(3)
    g.graph[0].append(1)
    g.graph[1].append(2)
    result, sorting_id = g.topologicalSort()

def test_simple_branch():
    # Test with a small branching graph: 0 -> 1, 0 -> 2
    g = Graph(3)
    g.graph[0].append(1)
    g.graph[0].append(2)
    result, sorting_id = g.topologicalSort()

def test_disconnected_components():
    # Test with 4 nodes, two disconnected chains: 0 -> 1, 2 -> 3
    g = Graph(4)
    g.graph[0].append(1)
    g.graph[2].append(3)
    result, sorting_id = g.topologicalSort()

# ------------------ Edge Test Cases ------------------

def test_graph_with_cycle():
    # Test with a cycle: 0 -> 1 -> 2 -> 0
    g = Graph(3)
    g.graph[0].append(1)
    g.graph[1].append(2)
    g.graph[2].append(0)
    # The algorithm does not detect cycles, but we can check that
    # the output is not a valid topological sort
    result, sorting_id = g.topologicalSort()

def test_graph_with_self_loop():
    # Test with a self-loop: 0 -> 0
    g = Graph(1)
    g.graph[0].append(0)
    result, sorting_id = g.topologicalSort()

def test_graph_with_multiple_edges():
    # Test with multiple edges between same nodes: 0 -> 1 (twice)
    g = Graph(2)
    g.graph[0].append(1)
    g.graph[0].append(1)
    result, sorting_id = g.topologicalSort()

def test_graph_with_no_edges():
    # Test with 5 nodes, no edges
    g = Graph(5)
    result, sorting_id = g.topologicalSort()

def test_graph_with_isolated_nodes():
    # Test with 5 nodes, some isolated: 0->1, 2 isolated, 3->4
    g = Graph(5)
    g.graph[0].append(1)
    g.graph[3].append(4)
    result, sorting_id = g.topologicalSort()

def test_graph_with_reverse_edges():
    # Test with edges in reverse order: 1->0, 2->1, 3->2
    g = Graph(4)
    g.graph[1].append(0)
    g.graph[2].append(1)
    g.graph[3].append(2)
    result, sorting_id = g.topologicalSort()

def test_graph_with_duplicate_vertices():
    # Test that duplicate vertices are not possible (vertices are indexed 0..V-1)
    g = Graph(3)
    g.graph[0].append(1)
    g.graph[1].append(2)
    g.graph[2].append(1)  # 2 -> 1
    result, sorting_id = g.topologicalSort()

# ------------------ Large Scale Test Cases ------------------

def test_large_linear_chain():
    # Test with 1000 nodes in a linear chain: 0->1->2->...->999
    N = 1000
    g = Graph(N)
    for i in range(N-1):
        g.graph[i].append(i+1)
    result, sorting_id = g.topologicalSort()

def test_large_branching_graph():
    # Test with 1000 nodes, each node 0..998 points to 999
    N = 1000
    g = Graph(N)
    for i in range(N-1):
        g.graph[i].append(N-1)
    result, sorting_id = g.topologicalSort()

def test_large_disconnected_graph():
    # Test with 1000 nodes, 10 chains of 100 nodes each
    N = 1000
    g = Graph(N)
    for chain_start in range(0, N, 100):
        for i in range(chain_start, chain_start+99):
            g.graph[i].append(i+1)
    result, sorting_id = g.topologicalSort()
    # Each chain must be ordered
    for chain_start in range(0, N, 100):
        for i in range(chain_start, chain_start+99):
            pass

def test_large_graph_with_no_edges():
    # Test with 1000 nodes, no edges
    N = 1000
    g = Graph(N)
    result, sorting_id = g.topologicalSort()

def test_large_graph_with_multiple_edges():
    # Test with 1000 nodes, node 0 points to all others
    N = 1000
    g = Graph(N)
    for i in range(1, N):
        g.graph[0].append(i)
    result, sorting_id = g.topologicalSort()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from code_to_optimize.topological_sort import Graph

def test_Graph_topologicalSort():
    Graph.topologicalSort(Graph(1))
🔎 Concolic Coverage Tests and Runtime

To edit these changes git checkout codeflash/optimize-Graph.topologicalSort-mgpn9bye and push.

Codeflash

The optimization achieves a **61% speedup** by eliminating an expensive O(n) list operation that was being called repeatedly.

**Key optimization:**
- **Replaced `stack.insert(0, v)` with `stack.append(v)` + final `stack.reverse()`**: The original code used `insert(0, v)` to prepend elements to the stack, which requires shifting all existing elements and costs O(n) time per insertion. With 7,163 calls to this operation (as shown in the profiler), this became a significant bottleneck consuming 16.6% of the function's time.

- **Changed boolean comparisons**: Replaced `visited[i] == False` with `not visited[i]` for slightly cleaner, more Pythonic code.

**Why this works:**
The topological sort algorithm needs to output vertices in reverse post-order. Instead of building this order directly with expensive prepends, the optimized version builds the reverse order efficiently with O(1) appends, then reverses the entire list once at the end. Since `list.reverse()` is O(n) but only called once versus O(n) operations called thousands of times, this dramatically reduces the time complexity from O(n²) to O(n) for the stack operations.

**Performance characteristics:**
The optimization particularly excels with larger graphs (as seen in the large-scale test cases with 1000+ nodes) where the quadratic behavior of repeated insertions becomes most apparent. For smaller graphs, the improvement is still measurable but less dramatic.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 October 13, 2025 21:27
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 13, 2025
@KRRT7 KRRT7 closed this Oct 13, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-Graph.topologicalSort-mgpn9bye branch October 13, 2025 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant