Skip to content

Conversation

@codegen-sh
Copy link

@codegen-sh codegen-sh bot commented May 11, 2025

User description

This PR adds three Python scripts to help organize your codebase structure programmatically:

  1. organize_codebase.py - A general-purpose script that analyzes file contents and categorizes them based on patterns
  2. organize_specific_codebase.py - A script specifically tailored to organize the codebase structure shown in the screenshot
  3. organize_with_codegen_sdk.py - An advanced script that uses the Codegen SDK to move symbols between files and automatically update imports

Usage

All scripts can be run in "dry run" mode first to see the planned changes without actually moving files:

python organize_codebase.py /path/to/your/codebase

To actually execute the changes:

python organize_codebase.py /path/to/your/codebase --execute

The Codegen SDK script provides the most robust solution as it properly handles imports and dependencies when moving files.


💻 View my workAbout Codegen

Description by Korbit AI

What change is being made?

Add three Python scripts to organize a codebase: one using categorization by functionality, another for a specific structure, and a third using the Codegen SDK for automated restructuring and import updates.

Why are these changes being made?

These scripts are introduced to facilitate codebase organization by categorizing files based on content and filenames, achieving specific directory structures, and leveraging the Codegen SDK for automated file moves and import adjustments. This approach aims to enhance maintainability and accessibility of the codebase, ensuring that related code files are logically grouped and easily navigable.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description


PR Type

Enhancement, Other


Description

  • Introduces a comprehensive suite of codebase analysis and organization tools, including:

    • Unified codebase analyzer modules with plugin-based architecture for extensible analysis (code quality, dependency, PR comparison, etc.).
    • Advanced code quality analyzer for detecting dead code, complexity, style, and maintainability issues.
    • Dependency analysis with graph-based detection of import relationships, circular dependencies, and module coupling.
    • Codebase visualizer supporting multiple visualization types (call/dependency graphs, complexity heatmaps, PR comparisons) with CLI and export options.
    • Standardized data models for analysis results and issue types, enabling consistent reporting and downstream consumption.
    • Automated and manual issue resolution management system with support for auto-fix, suggestions, and tracking.
    • Enhanced context and utility modules for codebase selection, documentation extraction, and advanced graph manipulation.
    • Adds three Python scripts for programmatic codebase organization, including a general-purpose organizer, a structure-specific organizer, and an advanced organizer using the Codegen SDK for automated symbol moves and import updates.
  • All major modules provide command-line interfaces for ease of use and integration.

  • Lays the foundation for scalable, automated, and maintainable codebase analysis and restructuring workflows.


Changes walkthrough 📝

Relevant files
Enhancement
11 files
codebase_analyzer.py
Add comprehensive codebase and PR analyzer using Codegen SDK

codegen-on-oss/codegen_on_oss/analyzers/codebase_analyzer.py

  • Introduces a comprehensive codebase and PR analyzer leveraging the
    Codegen SDK.
  • Implements static and advanced graph-based analysis for codebases and
    pull requests, including dead code, parameter issues, error handling,
    call site compatibility, import/inheritance issues, code complexity,
    and more.
  • Provides detailed comparison between base and PR codebases, including
    dependency and API changes.
  • Includes a command-line interface supporting various analysis types
    and output formats (JSON, HTML, console).
  • +1901/-0
    code_quality.py
    Add code quality analyzer for dead code and complexity     

    codegen-on-oss/codegen_on_oss/analyzers/code_quality.py

  • Adds a code quality analyzer module for detecting dead code,
    complexity, style, and maintainability issues.
  • Implements detection for unused variables, functions, classes,
    imports, and various code smells.
  • Calculates cyclomatic complexity, maintainability index, and flags
    problematic patterns.
  • Integrates with an issue collection system and supports filtering and
    categorization of issues.
  • +1102/-0
    current_code_codebase.py
    Add codebase selection and documentation object utilities

    codegen-on-oss/codegen_on_oss/analyzers/current_code_codebase.py

  • Provides utilities for initializing and selecting a codebase from a
    repository path.
  • Adds functions to import all modules from a directory and collect
    documented objects using decorators.
  • Supplies convenience functions to retrieve both the codebase and its
    documented objects for further analysis or documentation.
  • +230/-0 
    unified_analyzer.py
    Add unified codebase analyzer with plugin architecture and CLI

    codegen-on-oss/codegen_on_oss/analyzers/unified_analyzer.py

  • Introduces a new comprehensive module for unified codebase analysis.
  • Implements a plugin-based architecture with analyzer plugins for code
    quality and dependency analysis.
  • Provides a main class (UnifiedCodeAnalyzer) that supports codebase
    initialization, PR analysis, and multiple analysis types.
  • Adds command-line interface for running analyses and outputting
    results in JSON or HTML format.
  • +1633/-0
    codebase_visualizer.py
    Add codebase visualizer module with multiple visualization types and
    CLI

    codegen-on-oss/codegen_on_oss/analyzers/codebase_visualizer.py

  • Adds a new module for visualizing codebase structure, dependencies,
    and analysis results.
  • Supports multiple visualization types (call graph, dependency graph,
    blast radius, complexity heatmap, PR comparison, etc.).
  • Integrates with codebase analyzer and context to generate
    visualizations as JSON, PNG, SVG, or DOT files.
  • Provides a command-line interface for generating visualizations based
    on user input.
  • +1561/-0
    analysis_result.py
    Add analysis result data models for standardized output   

    codegen-on-oss/codegen_on_oss/analyzers/models/analysis_result.py

  • Introduces data models for representing and serializing analysis
    results.
  • Defines dataclasses for various analysis result types (code quality,
    dependencies, PR analysis, security, performance).
  • Provides methods for converting results to/from dictionaries and
    files, and for merging results.
  • Standardizes the structure of analysis outputs for downstream
    consumption.
  • +311/-0 
    analyzer.py
    Unified codebase analyzer with plugin architecture and reporting

    codegen-on-oss/codegen_on_oss/analyzers/analyzer.py

  • Introduces a comprehensive analyzer module for codebase analysis.
  • Implements a plugin-based architecture with registry for analysis
    types (code quality, dependency).
  • Provides a unified AnalyzerManager class to coordinate analysis,
    manage issues, and generate reports in multiple formats (JSON, HTML,
    console).
  • Includes command-line interface for running analyses on local or
    remote repositories.
  • +911/-0 
    dependencies.py
    Dependency analysis with graph-based detection and reporting

    codegen-on-oss/codegen_on_oss/analyzers/dependencies.py

  • Adds a dependency analysis module for codebases.
  • Implements detection of import relationships, circular dependencies,
    module coupling, and external dependencies.
  • Builds and analyzes import, module, call, and class hierarchy graphs
    using networkx.
  • Integrates with issue tracking and provides structured analysis
    results.
  • +860/-0 
    context_codebase.py
    Advanced codebase context for graph analysis and diffs     

    codegen-on-oss/codegen_on_oss/analyzers/context_codebase.py

  • Introduces an enhanced CodebaseContext class for advanced codebase
    graph manipulation.
  • Provides methods for building, modifying, and comparing codebase
    graphs, including PR and diff support.
  • Supports directory/file/symbol management, dependency computation, and
    transaction handling.
  • Integrates with SDK and caches nodes for performance.
  • +912/-0 
    resolution_manager.py
    Automated and manual issue resolution management system   

    codegen-on-oss/codegen_on_oss/analyzers/resolution/resolution_manager.py

  • Implements a resolution manager for code issues detected during
    analysis.
  • Supports strategies for auto-fix, suggestion, manual, and ignore for
    various issue types.
  • Provides methods to generate, apply, validate, rollback, and ignore
    resolutions.
  • Tracks resolution status and history for issues.
  • +761/-0 
    issue_types.py
    Standardized issue types and enums for analyzers                 

    codegen-on-oss/codegen_on_oss/analyzers/issue_types.py

  • Defines enums for analysis types, issue severity, and issue
    categories.
  • Introduces a dataclass for representing issues with
    serialization/deserialization support.
  • Standardizes issue representation across analyzers.
  • +82/-0   
    Miscellaneous
    1 files
    __init__.py
    Context submodule initialization and exports                         

    codegen-on-oss/codegen_on_oss/analyzers/context/init.py

  • Adds an __init__ file for the context submodule.
  • Imports and exposes context classes for codebase, file, and function.
  • Documents the purpose of the context module.
  • +16/-0   
    Additional files
    27 files
    test.yml +17/-7   
    README.md +248/-0 
    __init__.py +93/-0   
    analyzer_manager.py +452/-0 
    api.py +597/-0 
    base_analyzer.py +351/-0 
    code_quality_analyzer.py +530/-0 
    codebase_context.py +694/-0 
    codebase.py +465/-0 
    file.py +332/-0 
    function.py +336/-0 
    __init__.py +179/-0 
    dependency_analyzer.py +484/-0 
    error_analyzer.py +418/-0 
    issue_analyzer.py +213/-0 
    issues.py +493/-0 
    __init__.py [link]   
    __init__.py [link]   
    snapshot_manager.py +780/-0 
    __init__.py [link]   
    analysis_visualizer.py +522/-0 
    code_visualizer.py +504/-0 
    codebase_visualizer.py +426/-0 
    visualizer.py +337/-0 
    organize_codebase.py +234/-0 
    organize_specific_codebase.py +156/-0 
    organize_with_codegen_sdk.py +203/-0 

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • @korbit-ai
    Copy link

    korbit-ai bot commented May 11, 2025

    By default, I don't review pull requests opened by bots. If you would like me to review this pull request anyway, you can request a review via the /korbit-review command in a comment.

    @sourcery-ai
    Copy link

    sourcery-ai bot commented May 11, 2025

    Reviewer's Guide

    This pull request introduces three Python scripts to automate and standardize codebase organization, offering both general-purpose and codebase-specific solutions, including advanced support for symbol-level moves and import updates via the Codegen SDK.

    File-Level Changes

    Change Details Files
    Added a general-purpose script for categorizing and organizing Python files based on content and filename patterns.
    • Implements file categorization using regex patterns for common codebase concerns (e.g., analyzers, context, issues).
    • Builds a dependency graph to inform categorization when patterns are ambiguous.
    • Supports dry run and execution modes for safe file moves.
    organize_codebase.py
    Added a script tailored to organize a specific codebase structure as defined by a fixed plan.
    • Defines a static organization plan mapping files to target folders based on a known structure.
    • Moves files according to the plan, with fallback categorization for uncategorized files.
    • Supports dry run and execution modes.
    organize_specific_codebase.py
    Added an advanced organization script leveraging the Codegen SDK to move symbols between files and update imports automatically.
    • Uses Codegen SDK to move symbols (not just files) and update all relevant imports.
    • Implements the same static organization plan as the specific script, but with symbol-level granularity.
    • Handles uncategorized files with filename-based heuristics and provides robust import management.
    organize_with_codegen_sdk.py

    Tips and commands

    Interacting with Sourcery

    • Trigger a new review: Comment @sourcery-ai review on the pull request.
    • Continue discussions: Reply directly to Sourcery's review comments.
    • Generate a GitHub issue from a review comment: Ask Sourcery to create an
      issue from a review comment by replying to it. You can also reply to a
      review comment with @sourcery-ai issue to create an issue from it.
    • Generate a pull request title: Write @sourcery-ai anywhere in the pull
      request title to generate a title at any time. You can also comment
      @sourcery-ai title on the pull request to (re-)generate the title at any time.
    • Generate a pull request summary: Write @sourcery-ai summary anywhere in
      the pull request body to generate a PR summary at any time exactly where you
      want it. You can also comment @sourcery-ai summary on the pull request to
      (re-)generate the summary at any time.
    • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
      request to (re-)generate the reviewer's guide at any time.
    • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
      pull request to resolve all Sourcery comments. Useful if you've already
      addressed all the comments and don't want to see them anymore.
    • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
      request to dismiss all existing Sourcery reviews. Especially useful if you
      want to start fresh with a new review - don't forget to comment
      @sourcery-ai review to trigger a new review!

    Customizing Your Experience

    Access your dashboard to:

    • Enable or disable review features such as the Sourcery-generated pull request
      summary, the reviewer's guide, and others.
    • Change the review language.
    • Add, remove or edit custom review instructions.
    • Adjust other review settings.

    Getting Help

    @coderabbitai
    Copy link

    coderabbitai bot commented May 11, 2025

    Important

    Review skipped

    Bot user detected.

    To trigger a single review, invoke the @coderabbitai review command.

    You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Generate unit testing code for this file.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai generate unit testing code for this file.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and generate unit testing code.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Support

    Need help? Join our Discord community for assistance with any issues or questions.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    @codegen-sh
    Copy link
    Author

    codegen-sh bot commented May 11, 2025

    Hey! 👋 I see one of the checks failed. I am on it! 🫡

    💻 View my work

    @codegen-sh
    Copy link
    Author

    codegen-sh bot commented May 11, 2025

    🧩 View PR #96 on Github 🧩

    I've created a new PR with the codebase organization scripts. The previous PR (#95) was failing CI checks due to permission issues with the bot user. This new PR contains the same code but should pass the checks properly.

    The PR includes three Python scripts for organizing codebases:

    💻 View my work • React 👍 or 👎

    @Zeeeepa Zeeeepa marked this pull request as ready for review May 12, 2025 11:20
    @korbit-ai
    Copy link

    korbit-ai bot commented May 12, 2025

    By default, I don't review pull requests opened by bots. If you would like me to review this pull request anyway, you can request a review via the /korbit-review command in a comment.

    @Zeeeepa Zeeeepa merged commit 595115b into develop May 12, 2025
    11 of 17 checks passed
    @codiumai-pr-agent-free
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Exception Handling

    The code has broad exception handling that catches all exceptions and continues execution. This could mask critical errors and make debugging difficult.

    result = {
        "long_chains": [],
        "critical_paths": []
    }
    
    # Find long dependency chains
    for node in context.nodes:
        if not hasattr(node, 'name'):
            continue
    
        # Skip non-symbol nodes
        if not isinstance(node, Symbol):
            continue
    Missing Error Handling

    Several network operations and graph analysis functions lack proper error handling, which could lead to silent failures or incomplete analysis results.

    except nx.NetworkXNoPath:
        pass
    Resource Management

    The code creates temporary directories for cloning repositories but doesn't explicitly clean them up, which could lead to resource leaks.

    tmp_dir = tempfile.mkdtemp(prefix="codebase_analyzer_")

    @codiumai-pr-agent-free
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix node existence check

    The function checks if node exists in the graph but then uses node_id (the
    memory address) as the actual node identifier. This mismatch can cause incorrect
    node existence checks. Update the check to use node_id.

    codegen-on-oss/codegen_on_oss/analyzers/codebase_visualizer.py [161-174]

     def _add_node(self, node: Any, **attrs):
         """
         Add a node to the visualization graph with attributes.
         
         Args:
             node: Node object to add
             **attrs: Node attributes
         """
    -    # Skip if node already exists
    -    if self.graph.has_node(node):
    -        return
    -    
         # Generate node ID (memory address for unique identification)
         node_id = id(node)
    +    
    +    # Skip if node already exists
    +    if self.graph.has_node(node_id):
    +        return
    • Apply / Chat
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion correctly identifies an inconsistency. The graph nodes are added using node_id = id(node) as the identifier (line 191). Therefore, checking for node existence with self.graph.has_node(node) (line 170) is incorrect and should be self.graph.has_node(node_id). This fixes a potential bug where node existence might be wrongly reported.

    Medium
    General
    Improve error detection logic

    The check for 'try' in the function source is too simplistic and can lead to
    false positives. A function might have 'try' blocks that don't cover all 'await'
    expressions. Use a more precise check to verify that each 'await' is within a
    try block.

    codegen-on-oss/codegen_on_oss/analyzers/codebase_analyzer.py [1741-1751]

     # Check for error handling in async functions
     if hasattr(func, 'is_async') and func.is_async and hasattr(func, 'source'):
    -    if 'await' in func.source and 'try' not in func.source:
    +    # More precise check for awaits without try blocks
    +    source_lines = func.source.split('\n')
    +    has_unprotected_awaits = False
    +    in_try_block = False
    +    
    +    for line in source_lines:
    +        if 'try:' in line:
    +            in_try_block = True
    +        elif 'except' in line or 'finally:' in line:
    +            continue
    +        elif line.strip().startswith(('def ', 'class ', 'if ', 'else:', 'elif ')):
    +            in_try_block = False
    +            
    +        if 'await' in line and not in_try_block:
    +            has_unprotected_awaits = True
    +            break
    +            
    +    if has_unprotected_awaits:
             self.issues.append(Issue(
                 file=file_path,
                 line=func.line if hasattr(func, 'line') else None,
                 message=f"Async function {func_name} has awaits without try/except",
                 severity=IssueSeverity.WARNING,
                 symbol=func_name,
                 suggestion="Add error handling for await expressions"
             ))
    • Apply / Chat
    Suggestion importance[1-10]: 5

    __

    Why: The current string-based check ('await' in func.source and 'try' not in func.source) for unprotected await expressions is simplistic and can lead to inaccuracies. Suggesting a more precise analysis to ensure each await is within a try block is a valid improvement for the heuristic's accuracy, even if the provided improved_code has its own flaws.

    Low
    • More

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants