Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions docs/blog/fixing-import-loops.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: "Identifying Import Loops in Pytorch"
icon: "arrows-rotate"
iconType: "solid"
description: "Untangling module initialization patterns"
---

<Frame caption="Import loops in pytorch/torchgen/model.py">
<iframe
width="100%"
height="500px"
scrolling="no"
src={`https://www.codegen.sh/embedded/graph/?id=8b575318-ff94-41f1-94df-6e21d9de45d1&zoom=1&targetNodeName=model`}
className="rounded-xl"
style={{
backgroundColor: "#15141b",
}}
></iframe>
</Frame>

Debugging these errors can be a challenge, especially when they occur in large codebases.
However, Codegen allows us to identify these loops through our visualization tools and fix them very deterministically and at scale.

Here, we'll show how we used Codegen to identify all import loops in the [PyTorch](https://github.com/pytorch/pytorch) codebase and fix them using the suggested strategies. Using Codegen, we'll identify problematic import loops in PyTorch codebase, find one occurrence that could potentially impact initialization patterns and propose a fix.

<Info>
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-examples/tree/main/examples/removing_import_loops_in_pytorch).
</Info>


## What Are Import Loops and Why Do They Matter?

Import loops (or circular dependencies) occur when two or more Python modules depend on each other, creating a cycle in the import graph. For example:

```python
# module_a.py
from module_b import function_b

# module_b.py
from module_a import function_a
```

While Python can handle some import cycles through its import machinery, they can lead to several issues:
- Runtime errors and import deadlocks
- Harder-to-maintain code
- Initialization order problems
- Increased cognitive load for developers

However, not all import cycles are problematic! Some cycles using dynamic imports can work perfectly fine:

<Frame>
<img src="/images/valid-import-loop.png" alt="Valid import loop example" />
</Frame>

## Investigating Import Loops in PyTorch

Using Codegen, we discovered several import cycles in PyTorch's codebase.

```python
G = nx.MultiDiGraph()

# Add all edges to the graph
for imp in codebase.imports:
if imp.from_file and imp.to_file:
edge_color = "red" if imp.is_dynamic else "black"
edge_label = "dynamic" if imp.is_dynamic else "static"

# Store the import statement and its metadata
G.add_edge(
imp.to_file.filepath,
imp.from_file.filepath,
color=edge_color,
label=edge_label,
is_dynamic=imp.is_dynamic,
import_statement=imp, # Store the whole import object
key=id(imp.import_statement),
)
# Find strongly connected components
cycles = [scc for scc in nx.strongly_connected_components(G) if len(scc) > 1]

print(f"🔄 Found {len(cycles)} import cycles:")
for i, cycle in enumerate(cycles, 1):
print(f"\nCycle #{i}:")
print(f"Size: {len(cycle)} files")

# Create subgraph for this cycle to count edges
cycle_subgraph = G.subgraph(cycle)

# Count total edges
total_edges = cycle_subgraph.number_of_edges()
print(f"Total number of imports in cycle: {total_edges}")

# Count dynamic and static imports separately
dynamic_imports = sum(1 for u, v, data in cycle_subgraph.edges(data=True) if data.get("color") == "red")
static_imports = sum(1 for u, v, data in cycle_subgraph.edges(data=True) if data.get("color") == "black")

print(f"Number of dynamic imports: {dynamic_imports}")
print(f"Number of static imports: {static_imports}")
```

Here is one example visualized:

<Frame caption="Import loops in pytorch/torchgen/model.py">
<iframe
width="100%"
height="500px"
scrolling="no"
src={`https://www.codegen.sh/embedded/graph/?id=8b575318-ff94-41f1-94df-6e21d9de45d1&zoom=1&targetNodeName=model`}
className="rounded-xl"
style={{
backgroundColor: "#15141b",
}}
></iframe>
</Frame>

While this these loops might initially raise concerns, PyTorch implements dynamic imports to ensure that these loops are managed intentionally to prevent any runtime initialization issues.

As now we are able to visualize the loops we can clearly analyze if the import cycle is valid or not due to the `import_symbol.is_dynamic` property. As long as one the edges is dynamic inside of the strongly connected component, it essentially solves most errors dealing with runtime conflicts.

However, there was one import loop that was particularly interesting that we found between [`flex_decoding.py`](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/flex_decoding.py) and [`flex_attention.py`](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/flex_attention.py):

<img src="/images/problematic-import-loop.png" alt="Invalid import loop example" />

`flex_decoding.py` imports from `flex_attention.py` *two times* — once that is dynamic and once that is top-level.

Although this may not cause immediate runtime issues, it can lead to unpredictable behavior as it's a mixed dynamic and static import from the same module.


Thus, we propose the following refactoring using Codegen:


### Move Shared Code to a Separate `utils.py` File

```python
# Create new utils file
utils_file = codebase.create_file("torch/_inductor/kernel/flex_utils.py")

# Get the two files involved in the import cycle
decoding_file = codebase.get_file("torch/_inductor/kernel/flex_decoding.py")
attention_file = codebase.get_file("torch/_inductor/kernel/flex_attention.py")
attention_file_path = "torch/_inductor/kernel/flex_attention.py"
decoding_file_path = "torch/_inductor/kernel/flex_decoding.py"

# Track symbols to move
symbols_to_move = set()

# Find imports from flex_attention in flex_decoding
for imp in decoding_file.imports:
if imp.from_file and imp.from_file.filepath == attention_file_path:
# Get the actual symbol from flex_attention
if imp.imported_symbol:
symbols_to_move.add(imp.imported_symbol)

# Move identified symbols to utils file
for symbol in symbols_to_move:
symbol.move_to_file(utils_file)

print(f"🔄 Moved {len(symbols_to_move)} symbols to flex_utils.py")
for symbol in symbols_to_move:
print(symbol.name)
```

Running this codemod will move all the shared symbols to a separate `utils.py` as well as resolve the imports from both files to point to the newly created file solving this potential unpredictable error that could lead issues later on.


## Conclusion

Import loops are a common challenge in large Python codebases, but with the right tools and strategies, they can be effectively managed. Using Codegen, no matter the repo size, you will gain some new insights into your codebase and perform deterministic manipulations saving developer hours.

Want to try it yourself? Check out our [complete example](https://github.com/codegen-sh/codegen-examples/tree/main/examples/removing_import_loops_in_pytorch) of fixing import loops using Codegen.
2 changes: 1 addition & 1 deletion docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@
},
{
"group": "Blog",
"pages": ["blog/posts", "blog/act-via-code"]
"pages": ["blog/posts", "blog/act-via-code", "blog/fixing-import-loops"]
},
{
"group": "API Reference",
Expand Down