Skip to content

Implement virtual disk based workspace snapshotter #108

@hach-que

Description

@hach-que

Given a source control history that looks something like this (whether a project or Unreal Engine itself):

gitGraph
   commit
   commit
   commit
   commit
   branch pr-1
   checkout pr-1
   commit
   commit
   checkout main
   commit
   commit
Loading

For each commit we can have a virtual disk (VHD, sparseimage or remote ZFS dataset) that contains the checked out version of that commit (including all submodules). When we want to check out a new commit, rather than cloning the repository fresh, we instantiate from the closest snapshot and then get the repository up-to-date by just checking out the target commit.

In addition, for each commit we can also snapshot the working directory of successful builds, and track those off the commits as well:

flowchart TD;
  commit1 -->|weight 1| commit2;
  commit2 -->|weight 2| commit3;
  commit3 -->|weight 1| commit4;
  commit4 -->|weight 4| commit5;
  commit5 -->|weight 2| commit6;

  commit4 -->|weight 40| commit4build1[commit4: Assemble Project];
  commit4 -->|weight 50| commit4build2[commit4: Build Editor Win64];

  commit5 -->|weight N/A| commit5build1[commit5: Assemble Project];
  commit5 -->|weight N/A| commit5build2[commit5: Build Editor Win64];

  commit4build1 -->|weight 10| commit5build1;
  commit4build2 -->|weight 12| commit5build2;
Loading

The weight here measures how costly it is to get from the previous state to the new state, and is used to determine what node is truly "closest" to the target. For Git commits the weight would be the number of files to checkout, while for builds the weight would be the approximate time to build (from no intermediate artifacts). In some cases the weight may not be available because we didn't go from the source state to the target state and thus haven't measured it, even though the target state is a child of the parent state. In the example above, the commit5 builds used the commit4 builds as the starting snapshot and then did git checkout to get the working directory up-to-date, instead of starting at the commit5 checkout and then doing a full build with no intermediate artifacts.

When we want to produce a target build, such as commit6: Assemble Project, we first attempt to find the closest commit (lowest weight from a commit state to our target commit, with weight calculated from files changed where we don't already has snapshots for the target commit). We might not have a commit6 checkout disk image at this point, but we still track the commit and the weight to move between even without the disk image having been made so that we have that information for the next step.

We then find any previous builds of the same name (but different commit), such as ____: Assemble Project. We compute two distances:

  • The weight to go from ____: Assemble Project's checkout disk image to our target commit's disk image, which is:
    • If the target commit's disk image doesn't already exist, then the number of files changed (note that if the same file is updated 5 commits in a row, the cumulative weight here is just 1 because it's still only one file changed to go from A->B), to represent us making a new commit disk image running Git checkout to bring up the differences.
    • Plus the estimated weight of a fresh build (we can use the last recorded commit -> full build weight for this).
  • The weight to go from ____: Assemble Project build disk image to our target disk image, where the estimated weight is a heuristic based on how many compilations we think we'll need to do, plus a heuristic on how many files we'll need to clean out of the workspace. The number of compilations is not strictly files changed * some time heuristic - we should ideally use the "include dependency graph" that UBT already calculates and use that to figure out what build artifacts will be re-built based on the files that have changed since the original build disk image. There is a law of diminishing returns here - eventually you'll change enough files that you're rebuilding the whole program anyway, in which case it's cheaper to start from a fresh workspace and get the source code up to date, than to start with build artifacts from a previous commit, clean things out and then do an incremental build.

Snapshotting and disk images on various platforms

  • On Windows we can use VHDX files with differencing disks for snapshots. We'll need to periodically merge differencing disks so that the disk chain doesn't end up too huge. We'll need to do that based on which parent snapshots haven't been used directly on their own for a while. We will also need a system service that can do the create/mount/etc of VHDX files because it requires Administrator permission.
  • On macOS we can use sparseimages without shadow files. Instead when we want to create a new "differencing disk", we use APFS copy-on-write so the new disk image only consumes space for the actual differences.
  • On remote ZFS we use ZFS datasets and snapshots as usual.

Storing the graph

We don't want to run a central service on each machine to manage this state - the more code that is in a central service, the most manual updates people would have to do when that service code changes.

Instead we should be able to use an SQLite database (which should allow for multi-process writing), and have UET processes coordinate the graph as they operate on it. We need to keep in mind that multiple UET processes might be trying to reach a target build state and only have them record target disks as available starting points once the build has been successful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions