-
Notifications
You must be signed in to change notification settings - Fork 322
feat: microgen - add AST analysis utilities #2292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Migrates the empty __init__.py file to the microgenerator package.
Introduces the CodeAnalyzer class and helper functions for parsing Python code using the ast module. This provides the foundation for understanding service client structures.
scripts/microgenerator/generate.py
Outdated
|
|
||
| def _add_attribute(self, attr_name: str, attr_type: str | None = None): | ||
| """Adds a unique attribute to the current class context.""" | ||
| if self._current_class_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we raise an error if self._current_class_info turns out to be false, instead of doing nothing silently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current structure of the code is this:
def visit_Assign(self, node):
# ... other logic ...
if self._current_class_info:
# ... determine attr_name, attr_type ...
self._add_attribute(attr_name, attr_type)
self.generic_visit(node)
def visit_AnnAssign(self, node):
# ... other logic ...
if self._current_class_info:
# ... determine attr_name, attr_type ...
self._add_attribute(attr_name, attr_type)
self.generic_visit(node)
Given this structure:
- The
if self._current_class_info:check inside_add_attributeis redundant. It
will always beTruebecause the callers guarantee it. - There is no need to raise an error for a missing class context within
_add_attribute, as the situation is prevented by the calling methods.
I removed the check inside of _add_attribute. This makes _add_attribute cleaner and more focused, relying on the contract established by its callers. I updated the docstring to reflect this assumption.
scripts/microgenerator/generate.py
Outdated
|
|
||
| all_class_keys.append(key) | ||
|
|
||
| # Skip filling details if not needed for the dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check seems redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed.
| # Determine if the path is a file or directory and process accordingly | ||
| if os.path.isfile(path) and path.endswith(".py"): | ||
| structure, _, _ = parse_file(path) | ||
| process_structure(structure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education, why is file_name omitted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for asking! Seems like a reasonable question.
This design of the list_code_objects() function is to return a well-structured dictionary with clear keys, whether it's analyzing a single file or an entire directory.
This library can be run automatically OR interactively. It also has the ability to be run against a single file OR a directory full of files. Looking at both cases:
-
if os.path.isfile(path) and path.endswith(".py"):- This block executes when the user provides a path to a single Python file.
process_structure(structure)is called without thefile_nameargument.- Why? Since we are only analyzing one file, any class name found is unique to
that file. There's no need to disambiguate it with the filename in the output
keys. The key inprocess_structurewill just beclass_info["class_name"].
-
elif os.path.isdir(path):- This block executes when the user provides a path to a directory.
- The code iterates through all
.pyfiles within that directory. process_structure(structure, file_name=os.path.basename(file_path))is called
for each file.- Why? When scanning multiple files, it's possible to encounter classes with the
same name in different files. To prevent these from clobbering each other in
the results dictionary and to make the output clear, thefile_nameis used to
make the key unique. The key inprocess_structurebecomes
f"{class_info["class_name"]} (in {file_name})".
In essence: The file_name argument is only provided when processing a directory to
ensure that class names in the output dictionary are unique, even if the same class
name appears in multiple files. When processing a single file, this disambiguation
is not necessary.


Follows: PR #2291 (should be merged after that PR is merged.)
This PR adds the CodeAnalyzer class, which is a node visitor that traverses an AST and extracts structured information about classes, methods, and their arguments.
Inludes: