Skip to content

Conversation

@chalmerlowe
Copy link
Collaborator

Follows: PR #2291 (should be merged after that PR is merged.)

This PR adds the CodeAnalyzer class, which is a node visitor that traverses an AST and extracts structured information about classes, methods, and their arguments.

Inludes:

  • generate.py file
  • the CodeAnalyzer class
  • several helper functions

@chalmerlowe chalmerlowe requested review from a team as code owners September 15, 2025 13:35
@chalmerlowe chalmerlowe requested review from logachev and removed request for a team September 15, 2025 13:35
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Sep 15, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 15, 2025
@chalmerlowe chalmerlowe changed the title feat: Add AST analysis utilities feat: microgen - add AST analysis utilities Sep 15, 2025
@chalmerlowe chalmerlowe added this to the µgen PoC milestone Sep 16, 2025
@chalmerlowe chalmerlowe self-assigned this Sep 16, 2025
Base automatically changed from feat/migrate-init to feat-adds-method-partials September 16, 2025 14:22
@chalmerlowe
Copy link
Collaborator Author

For clarity:

  1. The GitHub Actions are being used to help ensure that unit tests pass.
    Screenshot 2025-08-20 at 9 14 08 AM
  2. The KOKORO tests are failing. This is a known problem and will be dealt with in a separate PR. It should not affect merging into the autogen dev branch.
    Screenshot 2025-08-20 at 9 13 45 AM


def _add_attribute(self, attr_name: str, attr_type: str | None = None):
"""Adds a unique attribute to the current class context."""
if self._current_class_info:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise an error if self._current_class_info turns out to be false, instead of doing nothing silently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current structure of the code is this:

def visit_Assign(self, node):
    # ... other logic ...
    if self._current_class_info:
        # ... determine attr_name, attr_type ...
        self._add_attribute(attr_name, attr_type)
    self.generic_visit(node)
 
def visit_AnnAssign(self, node):
    # ... other logic ...
    if self._current_class_info:
        # ... determine attr_name, attr_type ...
        self._add_attribute(attr_name, attr_type)
    self.generic_visit(node)

Given this structure:

  1. The if self._current_class_info: check inside _add_attribute is redundant. It
    will always be True because the callers guarantee it.
  2. There is no need to raise an error for a missing class context within _add_attribute, as the situation is prevented by the calling methods.

I removed the check inside of _add_attribute. This makes _add_attribute cleaner and more focused, relying on the contract established by its callers. I updated the docstring to reflect this assumption.


all_class_keys.append(key)

# Skip filling details if not needed for the dictionary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems redundant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed.

# Determine if the path is a file or directory and process accordingly
if os.path.isfile(path) and path.endswith(".py"):
structure, _, _ = parse_file(path)
process_structure(structure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, why is file_name omitted here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking! Seems like a reasonable question.

This design of the list_code_objects() function is to return a well-structured dictionary with clear keys, whether it's analyzing a single file or an entire directory.

This library can be run automatically OR interactively. It also has the ability to be run against a single file OR a directory full of files. Looking at both cases:

  1. if os.path.isfile(path) and path.endswith(".py"):

    • This block executes when the user provides a path to a single Python file.
    • process_structure(structure) is called without the file_name argument.
    • Why? Since we are only analyzing one file, any class name found is unique to
      that file. There's no need to disambiguate it with the filename in the output
      keys. The key in process_structure will just be class_info["class_name"].
  2. elif os.path.isdir(path):

    • This block executes when the user provides a path to a directory.
    • The code iterates through all .py files within that directory.
    • process_structure(structure, file_name=os.path.basename(file_path)) is called
      for each file.
    • Why? When scanning multiple files, it's possible to encounter classes with the
      same name
      in different files. To prevent these from clobbering each other in
      the results dictionary and to make the output clear, the file_name is used to
      make the key unique. The key in process_structure becomes
      f"{class_info["class_name"]} (in {file_name})".

In essence: The file_name argument is only provided when processing a directory to
ensure that class names in the output dictionary are unique, even if the same class
name appears in multiple files. When processing a single file, this disambiguation
is not necessary.

Base automatically changed from feat-adds-method-partials to autogen September 18, 2025 19:33
@chalmerlowe chalmerlowe merged commit 5259312 into autogen Sep 22, 2025
25 checks passed
@chalmerlowe chalmerlowe deleted the feat/add-ast-utilities branch September 22, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants