-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
The graph.json
file is rewritten on disk every time a new dependency is added to the graph.
The constant disk writes reduces the overall performance of the bootstrapping process, especially when running with a large number of dependencies. The solution would be to write the changes into an in-memory object when iterating over the dependencies, and then once the graph has been updated with all the dependencies, we can write and save the results to disk, just once.
The code for reference:
def bootstrap(self, req: Requirement, req_type: RequirementType) -> Version:
logger.info(f"bootstrapping {req} as {req_type} dependency of {self.why[-1:]}")
constraint = self.ctx.constraints.get_constraint(req.name)
if constraint:
logger.info(
f"incoming requirement {req} matches constraint {constraint}. Will apply both."
)
source_url, resolved_version = self.resolve_version(
req=req,
req_type=req_type,
)
pbi = self.ctx.package_build_info(req)
self._add_to_graph(req, req_type, resolved_version, source_url)
...
for dep in self._sort_requirements(install_dependencies):
with req_ctxvar_context(dep):
try:
self.bootstrap(req=dep, req_type=RequirementType.INSTALL)
except Exception as err:
raise ValueError(f"could not handle {self._explain}") from err
The bootstrap
function calls itself recursively for all dependencies, and within the bootstrap
function we run _add_to_graph
:
# fromager/bootstrapper.py
def _add_to_graph(
self,
req: Requirement,
req_type: RequirementType,
req_version: Version,
download_url: str,
) -> None:
if req_type == RequirementType.TOP_LEVEL:
return
_, parent_req, parent_version = self.why[-1] if self.why else (None, None, None)
pbi = self.ctx.package_build_info(req)
# Update the dependency graph after we determine that this requirement is
# useful but before we determine if it is redundant so that we capture all
# edges to use for building a valid constraints file.
self.ctx.dependency_graph.add_dependency(
parent_name=canonicalize_name(parent_req.name) if parent_req else None,
parent_version=parent_version,
req_type=req_type,
req=req,
req_version=req_version,
download_url=download_url,
pre_built=pbi.pre_built,
)
self.ctx.write_to_graph_to_file()
And the _add_to_graph
function calls write_to_graph_to_file
, which is defined as:
# src/fromager/context.py
def write_to_graph_to_file(self):
with self.graph_file.open("w", encoding="utf-8") as f:
self.dependency_graph.serialize(f)
Metadata
Metadata
Assignees
Labels
No labels