Skip to content
Merged
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions scripts/microgenerator/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,3 +492,121 @@ def analyze_source_files(

return parsed_data, all_imports, all_types, request_arg_schema


# =============================================================================
# Section 3: Code Generation
# =============================================================================

def _generate_import_statement(
context: List[Dict[str, Any]], key: str, path: str
) -> str:
"""Generates a formatted import statement from a list of context dictionaries.

Args:
context: A list of dictionaries containing the data.
key: The key to extract from each dictionary in the context.
path: The base import path (e.g., "google.cloud.bigquery_v2.services").

Returns:
A formatted, multi-line import statement string.
"""
names = sorted(list(set([item[key] for item in context])))
names_str = ",\n ".join(names)
return f"from {path} import (\n {names_str}\n)"


def generate_code(config: Dict[str, Any], analysis_results: tuple) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tuple input is a bit difficult to review to determine if the order of the fields is correct. Have you considered using a frozen data class? Or if positional access is required a named tuple?

Copy link
Collaborator Author

@chalmerlowe chalmerlowe Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the code first started, we were only passing one item, which became two in a tuple and then three and is now four items.

I agree, it is time to move it to a more robust solution. Not all the parts that will end up being affected by this move are in this PR, so I would much prefer to merge all the outstanding PRs before doing too many changes to logic, etc.

This is all microgenerator code so no customers are gonna see this OR interact with it, just us devs, but there are better approaches that will make our lives easier in the long run.

I will defer this to the TODO list hosted internally at b/445158219 for now.

"""
Generates source code files using Jinja2 templates.
"""
data, all_imports, all_types, request_arg_schema = analysis_results
project_root = config["project_root"]
config_dir = config["config_dir"]

templates_config = config.get("templates", [])
for item in templates_config:
template_path = os.path.join(config_dir, item["template"])
output_path = os.path.join(project_root, item["output"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit, optional] The pathlib.Path's / operator is a little less verbose and seems to be the preferred for new code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added Path to this section, but:

  • right now the function in utils.py that ends up using this wants a string so... we convert the whole concatenated Path object to a str()
  • Why not just update the utils.py file to take a Path OR a {string, Path}?
  • We can't run tests without multiple files and edits that are in two or three PRs that have not been merged yet, so I have no confidence that all the stars will align AND I did not want to try and do temporary workarounds to let me test this update. PR #2307 includes some, but not all the necessary changes include tests that are specific to utils.py

I will add an item to the TODO list hosted internally at b/445158219 to ensure that we circle back and clean up the os vs Path situation. I feel like there are prolly a couple other nooks and crannies where Path would be a better long-term solution.


template = utils.load_template(template_path)
methods_context = []
for class_name, methods in data.items():
for method_name, method_info in methods.items():
context = {
"name": method_name,
"class_name": class_name,
"return_type": method_info["return_type"],
}
Comment on lines +559 to +563
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using a data class for this instead of a dictionary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look this over and consider whether it should be modified in a future PR. Right now, for an alpha release to see what works and what doesn't, a very small dict is probably a reasonable conveyance in a microgenerator. Also added this to the TODO list for tracking.


# Infer the request class and find its schema.
inferred_request_name = name_utils.method_to_request_class_name(
method_name
)

# Check for a request class name override in the config.
method_overrides = (
config.get("filter", {}).get("methods", {}).get("overrides", {})
)
if method_name in method_overrides:
inferred_request_name = method_overrides[method_name].get(
"request_class_name", inferred_request_name
)

fq_request_name = ""
for key in request_arg_schema.keys():
if key.endswith(f".{inferred_request_name}"):
fq_request_name = key
break

# If found, augment the method context.
if fq_request_name:
context["request_class_full_name"] = fq_request_name
context["request_id_args"] = request_arg_schema[fq_request_name]

methods_context.append(context)
Comment on lines 557 to 574
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With several nested loops and if statements, I'm having some trouble following along today. Maybe worth adding some private helper methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pulled out two chunks of processing and created two helper functions. which definitely makes the code a bit easier to parse.

I think we might be able to a bit more, but gonna hold off until all the things are merged and working before pushing my luck.


# Prepare imports for the template
services_context = []
client_class_names = sorted(
list(set([m["class_name"] for m in methods_context]))
)

for class_name in client_class_names:
service_name_cluster = name_utils.generate_service_names(class_name)
services_context.append(service_name_cluster)

# Also need to update methods_context to include the service_name and module_name
# so the template knows which client to use for each method.
class_to_service_map = {s["service_client_class"]: s for s in services_context}
for method in methods_context:
service_info = class_to_service_map.get(method["class_name"])
if service_info:
method["service_name"] = service_info["service_name"]
method["service_module_name"] = service_info["service_module_name"]

# Prepare new imports
service_imports = [
_generate_import_statement(
services_context,
"service_module_name",
"google.cloud.bigquery_v2.services",
)
]

# Prepare type imports
type_imports = [
_generate_import_statement(
services_context, "service_name", "google.cloud.bigquery_v2.types"
)
]

final_code = template.render(
service_name=config.get("service_name"),
methods=methods_context,
services=services_context,
service_imports=service_imports,
type_imports=type_imports,
request_arg_schema=request_arg_schema,
)

utils.write_code_to_file(output_path, final_code)
Loading