Skip to content

Avoid frequent writes to graph.json during bootstrap #735

@MichaelYochpaz

Description

@MichaelYochpaz

The graph.json file is rewritten on disk every time a new dependency is added to the graph.

The constant disk writes reduces the overall performance of the bootstrapping process, especially when running with a large number of dependencies. The solution would be to write the changes into an in-memory object when iterating over the dependencies, and then once the graph has been updated with all the dependencies, we can write and save the results to disk, just once.

The code for reference:

  def bootstrap(self, req: Requirement, req_type: RequirementType) -> Version:
      logger.info(f"bootstrapping {req} as {req_type} dependency of {self.why[-1:]}")
      constraint = self.ctx.constraints.get_constraint(req.name)
      if constraint:
          logger.info(
              f"incoming requirement {req} matches constraint {constraint}. Will apply both."
          )

      source_url, resolved_version = self.resolve_version(
          req=req,
          req_type=req_type,
      )
      pbi = self.ctx.package_build_info(req)

      self._add_to_graph(req, req_type, resolved_version, source_url)

...

      for dep in self._sort_requirements(install_dependencies):
          with req_ctxvar_context(dep):
              try:
                  self.bootstrap(req=dep, req_type=RequirementType.INSTALL)
              except Exception as err:
                  raise ValueError(f"could not handle {self._explain}") from err

The bootstrap function calls itself recursively for all dependencies, and within the bootstrap function we run _add_to_graph:

# fromager/bootstrapper.py

 def _add_to_graph(
      self,
      req: Requirement,
      req_type: RequirementType,
      req_version: Version,
      download_url: str,
  ) -> None:
      if req_type == RequirementType.TOP_LEVEL:
          return

      _, parent_req, parent_version = self.why[-1] if self.why else (None, None, None)
      pbi = self.ctx.package_build_info(req)
      # Update the dependency graph after we determine that this requirement is
      # useful but before we determine if it is redundant so that we capture all
      # edges to use for building a valid constraints file.
      self.ctx.dependency_graph.add_dependency(
          parent_name=canonicalize_name(parent_req.name) if parent_req else None,
          parent_version=parent_version,
          req_type=req_type,
          req=req,
          req_version=req_version,
          download_url=download_url,
          pre_built=pbi.pre_built,
      )
      self.ctx.write_to_graph_to_file()

And the _add_to_graph function calls write_to_graph_to_file, which is defined as:

# src/fromager/context.py

    def write_to_graph_to_file(self):
        with self.graph_file.open("w", encoding="utf-8") as f:
            self.dependency_graph.serialize(f)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions