-
Notifications
You must be signed in to change notification settings - Fork 62
Blog for Import loops #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
1f38233
docs for import loops
62adc74
uploaded graphs
84cb7a3
linting
9b8371e
graph centering + spelling
0ce4b28
Merge branch 'develop' into docs-for-import-loops
486e5e7
adding blog
2ea53e1
:wq
77ec2a2
Merge branch 'develop' into blog-for-import-loops
8b0c9d7
done
b7ef609
done import loops
4249adb
Merge branch 'develop' into blog-for-import-loops
tawsifkamal 652948b
added links
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| --- | ||
| title: "Identifying Import Loops in Pytorch" | ||
| icon: "arrows-rotate" | ||
| iconType: "solid" | ||
| description: "Untangling module initialization patterns" | ||
| --- | ||
|
|
||
| <Frame caption="Import loops in pytorch/torchgen/model.py"> | ||
| <iframe | ||
| width="100%" | ||
| height="500px" | ||
| scrolling="no" | ||
| src={`https://www.codegen.sh/embedded/graph/?id=8b575318-ff94-41f1-94df-6e21d9de45d1&zoom=1&targetNodeName=model`} | ||
| className="rounded-xl" | ||
| style={{ | ||
| backgroundColor: "#15141b", | ||
| }} | ||
| ></iframe> | ||
| </Frame> | ||
|
|
||
| Debugging these errors can be a challenge, especially when they occur in large codebases. | ||
| However, Codegen allows us to identify these loops through our visualization tools and fix them very deterministically and at scale. | ||
|
|
||
| Here, we'll show how we used Codegen to identify all import loops in the [PyTorch](https://github.com/pytorch/pytorch) codebase and fix them using the suggested strategies. Using Codegen, we'll identify problematic import loops in PyTorch codebase, find one occurrence that could potentially impact initialization patterns and propose a fix. | ||
|
|
||
| <Info> | ||
| You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-examples/tree/main/examples/removing_import_loops_in_pytorch). | ||
| </Info> | ||
|
|
||
|
|
||
| ## What Are Import Loops and Why Do They Matter? | ||
|
|
||
| Import loops (or circular dependencies) occur when two or more Python modules depend on each other, creating a cycle in the import graph. For example: | ||
|
|
||
| ```python | ||
| # module_a.py | ||
| from module_b import function_b | ||
|
|
||
| # module_b.py | ||
| from module_a import function_a | ||
| ``` | ||
|
|
||
| While Python can handle some import cycles through its import machinery, they can lead to several issues: | ||
| - Runtime errors and import deadlocks | ||
| - Harder-to-maintain code | ||
| - Initialization order problems | ||
| - Increased cognitive load for developers | ||
|
|
||
| However, not all import cycles are problematic! Some cycles using dynamic imports can work perfectly fine: | ||
|
|
||
| <Frame> | ||
| <img src="/images/valid-import-loop.png" alt="Valid import loop example" /> | ||
| </Frame> | ||
|
|
||
| ## Investigating Import Loops in PyTorch | ||
|
|
||
| Using Codegen, we discovered several import cycles in PyTorch's codebase. | ||
|
|
||
| ```python | ||
| G = nx.MultiDiGraph() | ||
|
|
||
| # Add all edges to the graph | ||
| for imp in codebase.imports: | ||
| if imp.from_file and imp.to_file: | ||
| edge_color = "red" if imp.is_dynamic else "black" | ||
| edge_label = "dynamic" if imp.is_dynamic else "static" | ||
|
|
||
| # Store the import statement and its metadata | ||
| G.add_edge( | ||
| imp.to_file.filepath, | ||
| imp.from_file.filepath, | ||
| color=edge_color, | ||
| label=edge_label, | ||
| is_dynamic=imp.is_dynamic, | ||
| import_statement=imp, # Store the whole import object | ||
| key=id(imp.import_statement), | ||
| ) | ||
| # Find strongly connected components | ||
| cycles = [scc for scc in nx.strongly_connected_components(G) if len(scc) > 1] | ||
|
|
||
| print(f"🔄 Found {len(cycles)} import cycles:") | ||
| for i, cycle in enumerate(cycles, 1): | ||
| print(f"\nCycle #{i}:") | ||
| print(f"Size: {len(cycle)} files") | ||
|
|
||
| # Create subgraph for this cycle to count edges | ||
| cycle_subgraph = G.subgraph(cycle) | ||
|
|
||
| # Count total edges | ||
| total_edges = cycle_subgraph.number_of_edges() | ||
| print(f"Total number of imports in cycle: {total_edges}") | ||
|
|
||
| # Count dynamic and static imports separately | ||
| dynamic_imports = sum(1 for u, v, data in cycle_subgraph.edges(data=True) if data.get("color") == "red") | ||
| static_imports = sum(1 for u, v, data in cycle_subgraph.edges(data=True) if data.get("color") == "black") | ||
|
|
||
| print(f"Number of dynamic imports: {dynamic_imports}") | ||
| print(f"Number of static imports: {static_imports}") | ||
| ``` | ||
|
|
||
| Here is one example visualized: | ||
|
|
||
| <Frame caption="Import loops in pytorch/torchgen/model.py"> | ||
| <iframe | ||
| width="100%" | ||
| height="500px" | ||
| scrolling="no" | ||
| src={`https://www.codegen.sh/embedded/graph/?id=8b575318-ff94-41f1-94df-6e21d9de45d1&zoom=1&targetNodeName=model`} | ||
| className="rounded-xl" | ||
| style={{ | ||
| backgroundColor: "#15141b", | ||
| }} | ||
| ></iframe> | ||
| </Frame> | ||
|
|
||
| While this these loops might initially raise concerns, PyTorch implements dynamic imports to ensure that these loops are managed intentionally to prevent any runtime initialization issues. | ||
|
|
||
| As now we are able to visualize the loops we can clearly analyze if the import cycle is valid or not due to the `import_symbol.is_dynamic` property. As long as one the edges is dynamic inside of the strongly connected component, it essentially solves most errors dealing with runtime conflicts. | ||
|
|
||
| However, there was one import loop that was particularly interesting that we found between [`flex_decoding.py`](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/flex_decoding.py) and [`flex_attention.py`](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/flex_attention.py): | ||
|
|
||
| <img src="/images/problematic-import-loop.png" alt="Invalid import loop example" /> | ||
|
|
||
| `flex_decoding.py` imports from `flex_attention.py` *two times* — once that is dynamic and once that is top-level. | ||
|
|
||
| Although this may not cause immediate runtime issues, it can lead to unpredictable behavior as it's a mixed dynamic and static import from the same module. | ||
|
|
||
|
|
||
| Thus, we propose the following refactoring using Codegen: | ||
|
|
||
|
|
||
| ### Move Shared Code to a Separate `utils.py` File | ||
|
|
||
| ```python | ||
| # Create new utils file | ||
| utils_file = codebase.create_file("torch/_inductor/kernel/flex_utils.py") | ||
|
|
||
| # Get the two files involved in the import cycle | ||
| decoding_file = codebase.get_file("torch/_inductor/kernel/flex_decoding.py") | ||
| attention_file = codebase.get_file("torch/_inductor/kernel/flex_attention.py") | ||
| attention_file_path = "torch/_inductor/kernel/flex_attention.py" | ||
| decoding_file_path = "torch/_inductor/kernel/flex_decoding.py" | ||
|
|
||
| # Track symbols to move | ||
| symbols_to_move = set() | ||
|
|
||
| # Find imports from flex_attention in flex_decoding | ||
| for imp in decoding_file.imports: | ||
| if imp.from_file and imp.from_file.filepath == attention_file_path: | ||
| # Get the actual symbol from flex_attention | ||
| if imp.imported_symbol: | ||
| symbols_to_move.add(imp.imported_symbol) | ||
|
|
||
| # Move identified symbols to utils file | ||
| for symbol in symbols_to_move: | ||
| symbol.move_to_file(utils_file) | ||
|
|
||
| print(f"🔄 Moved {len(symbols_to_move)} symbols to flex_utils.py") | ||
| for symbol in symbols_to_move: | ||
| print(symbol.name) | ||
| ``` | ||
|
|
||
| Running this codemod will move all the shared symbols to a separate `utils.py` as well as resolve the imports from both files to point to the newly created file solving this potential unpredictable error that could lead issues later on. | ||
|
|
||
|
|
||
| ## Conclusion | ||
|
|
||
| Import loops are a common challenge in large Python codebases, but with the right tools and strategies, they can be effectively managed. Using Codegen, no matter the repo size, you will gain some new insights into your codebase and perform deterministic manipulations saving developer hours. | ||
|
|
||
| Want to try it yourself? Check out our [complete example](https://github.com/codegen-sh/codegen-examples/tree/main/examples/removing_import_loops_in_pytorch) of fixing import loops using Codegen. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.