Skip to content

Conversation

Byron
Copy link
Collaborator

@Byron Byron commented Aug 6, 2025

Figure out which V3 primitive is feasible next. Answer: stashing

It seems like any mutation that affects the worktree needs to deal with uncommitted changes first, so stashing should be a primitive.
Let's validate and research, and also see how the V3 version can benefit the current code. Answer: it should be possible to backport stashing, assuming that the merges and 'move tree between workspaces' functionality effectively is just a way to apply a stash, and nothing that uses side-effects to do what it wants and apply a stash at the same time.

Something that's always an obstacle right now is conflicted commits - could this be gotten rid of before?. There is no need to that just yet, but it's needed to do proper rebasing. The rebasing definitely needs to know 'hidden' conflicts in commits to work correctly, but with a new system at least there is no chance to get meta-trees into the workspace.

Tasks

  • basic tree API that could do the trick
  • basic stash-commit API (including enumeration or 'stash for ref' query)
  • raw snapshot of select worktree changes
    • no error on empty snapshot (to allow snapshotting only refs for instance)
    • apply snapshot
  • snapshots for index
    • keep original information with tree-changes (to avoid duplicate worktree-status) to prepare for complete index snapshots.
    • store conflicts
    • assure the current conflicted file (if available) is also picked up
    • test for index with conflicts and other changes

Skipped Follow-Up PRs

  • commit
    • snapshot commits
    • list snapshots as part of the RefInfo workspace
  • auto-testing of backward compatibility
  • try to backport using the new snapshots to current apply at least, maybe more
    • Important(performance): disable rename tracking for snapshot worktree changes.
    • enable these to succeed where previously they might fail due to the worktree changes cherry-pick failing
    • consider update_workspace as well, maybe some of the merge-conflicts can then be prevented?

Follow-Up PR

Shortcomings

  • As the final update of worktrees is delegated to a git2 reset/checkout, there is no support for configured worktree filters like git-lfs, and it's unlikely that submodules work like we'd want or as is intuitive.

Future Directions

How else can the snapshotting be used for value extraction?

  1. extend its capabilities to be at the core of a new oplog implementation (that is workspace/graph aware)
  2. build undo/redo based on new oplog (where store-snapshots are completely optional)

Research

My thoughts

What to do next?

  • anything that mutates the worktree needs stash handling
    • stashes can be ignored if there is nothing to stash, but requiring a clean worktree seems too limiting
  • stashes don't work with conflicted Git indices, so conflicts wouldn't be a problem. We can also ignore GitButler conflict commits as long as we prohibit referring/pointing to them.
  • conflict commits need to be redone to be 'safe', and I think this can happen once edit-mode is redone.
    • conflicts need metadata per commit (as opposed to stashes that are per ref), which can motivate commit-metadata support, which in turn motivates switching away from vb.toml even though it could be abused to do commit-metadata as well. All that is for another day.
    • conflicts can't be recorded/handled for the workspace commit right now even though there is no technical reason for it. It's more of a UX problem as it needs conflict resolutions to be stored in the right stack.

Thus, proper stash handling enables apply, unapply, reordering and commit manipulation. Committing itself is the only operation that only needs rebasing, which is also required by update worktree.

So key-capabilities are:

  • stashing - needed for all workspace/worktree mutations, but particularly apply and unapply
    • it can also be backported, with some effort, which should help existing code to be less error prone in the light of uncommitted changes.
    • after apply and unapply (without rebasing support).
      • apply would need another iteration to help with conflicts - can rebasing help then? Or can it be applied with conflicts, assuming our workspace-commit-with-conflicts handling is good enough? Could workspace conflict resolution not just be 'show conflicts' and users resolve and commit them to stacks of their choice? Or should stacks be re-based one commit at a time to record conflicts, to be resolved one at a time?
  • rebasing with merge-commit support (needs more research though to support rebasing the entire workspace)
    • also conflict-resolutions stored in merge-commits would not be picked up when re-merging. So these would have to be cherry-picked for all merge-commits, except for our workspace commit which can't currently handle conflicts anyway. In theory, our conflict handling would be good enough to even deal with conflicts that happen when rebasing merge conflicts.

General Ideas: Stack

How to create a stash?

  • ideally, we only stash what we have to, i.e. where a reset or checkout would overwrite files
  • we should not pickup untracked files unless we have to
    • it's fine to fail if one would be overwritten, but is too large to pick up.
  • stash should have labels to know where they are coming from. Some contain assigned changes, some are the unassigned/uncommitted worktree changes, or they are created by edit-mode.
  • Stashes should be part of the Graph and part of the workspace projection for easy access and query.
  • The index must be an explicit consideration, particularly if for some reason there are conflicts. To be a Gitizen would mean to not flatten what's in the index into a tree.
    • However, for all I can tell, Git would fail to stash the index if it has conflicts, so we can fail as well (-> fail on conflicts, but keep index tree).
    • Let's keep 1 commit + tree per stash, so our stash-trees have two entries, worktree and index, with the actual trees below. For simplicity, omit the index if it's the same as the worktree.
  • Not doing any stashing unless necessary is incredible important for performance, and is absolutely possible as well. After all, we need one worktree status to know what to look out for, and with that we can work until the operation is complete.
Stashing the index

Git creates a separate commit for the index and writes the index as tree. Deepwiki Details

How to restore a stash?

Right now git2 is used to reset or checkout, often with force, but applying stashes should be safe, ideally, while also dealing with conflicts responsibly. It's probably OK to keep using git2 for that, as long as the index or tree to checkout is correct. Untracked files should never be accidentally considered added, so Git won't remove them.

What about ignored files? They aren't ever seen but may be removed as they are expendable.

To restore a stash, in any case there needs to be a cherry-pick/three-way merge between the old HEAD, the new HEAD and the tree representing the worktree. This seems to be naturally catered to by creating a commit on top of old HEAD, while associating the stash with a namespaced reference of the same name as the one that pointed to old HEAD at the time of stash creation. This means only one stash per ref, unless the next stash commits can sit on top of the previous stash and on top of the old HEAD commit, using a 'fake merge'.

Conflicts when applying a stash need special consideration, and should probably be rejected by default to let the user select a policy, i.e. apply with markers, auto-resolve with ours or theirs (stash or workspace), or cancel.

How it is handled now

  • create_wd_tree() to get a Git tree for the worktree
    • ignores files larger than a given limit (unless disabled)
    • takes conflicted files as is (but won't be able to capture the index itself)
  • Edit Mode
    • uses create_wd_tree() to keep the state
    • then either checks out the tree directly when aborting (wiping all changes that might now be there)
    • or cherry-pick the changes between the old (initial) and the new (with edited commit integrated) workspace, and the stashed changes to take these as new tree.
  • Unapply Stack
    • the create_wd_tree() function is used to incorporate the worktree changes into the computation to unapply a stack.
      • it seems like separating these concerns entirely will be less error prone, i.e. there will be no conflicts due to anything in the worktree (especially given the merge-auto-resolution which changes things).
    • Assigned Changes (per Stack) are committed
      • this could be a stash on top of the stack branch, which would be visible and the UI could specifically understand it
  • update_uncommited_changes
    • Called from every function that updates the workspace commit, like 'update workspace', commit reordering, etc.
Copilot (Sonnet)

I'll help you find where the backend handles uncommitted files/worktree changes before altering the worktree. Let me search through the codebase to identify where stashing or similar operations are implemented. The search results look very promising! I found significant information about how GitButler handles uncommitted changes before altering the worktree. Let me search for more specific details about stashing and look at the gitbutler-edit-mode crate that showed up in the results. Now let me look at the mod.rs file which had interesting comments about stashing: Let me search for more specific stashing-related functions: Let me look at the create_wd_tree function which seems to be the core function for creating a tree from worktree changes: Let me look at the repository extension that implements the core create_wd_tree function: Let me look at the workspace handling functions and see how they use this stashing mechanism: Now let me look at the actual implementation of update_uncommited_changes_with_tree to understand how uncommitted changes are handled: Now let me search for specific operations that would trigger this stashing behavior, like branch switching: Let me look for the unapply_stack function to see how it handles uncommitted changes: Now let me look at the actual unapply_stack function implementation to see how it handles uncommitted changes: Now let me examine where the actual worktree operations happen when switching branches or workspace states. Let me search for the workspace base change handling: Let me look at the update_workspace_commit function which seems to be central to how workspace updates are handled: Perfect! Now I have a comprehensive understanding of how GitButler handles uncommitted files/worktree changes before altering the worktree. Let me provide you with a detailed summary.

How GitButler Handles Uncommitted Files/Worktree Changes

Based on my analysis of the GitButler codebase, here's how the backend handles uncommitted files and worktree changes before altering the worktree:

1. Core Stashing Mechanism via create_wd_tree()

The main mechanism GitButler uses is the create_wd_tree() function found in:

  • repository_ext.rs (interface)
  • lib.rs (implementation)

This function creates a Git tree object containing all uncommitted changes in the project, including:

  • Modified files in the worktree
  • Files in the index that are considered conflicted
  • Untracked files (up to a size limit)

2. Stashing Strategy in Edit Mode

When entering edit mode, GitButler uses a sophisticated stashing approach in lib.rs:

fn commit_uncommited_changes(ctx: &CommandContext) -> Result<()> {
    let repository = ctx.repo();
    let uncommited_changes = repository.create_wd_tree(0)?;
    repository.reference(UNCOMMITTED_CHANGES_REF, uncommited_changes.id(), true, "")?;
    Ok(())
}

The uncommitted changes are stored as a Git reference at refs/gitbutler/edit-uncommitted-changes.

3. Branch/Stack Operations

When unapplying stacks (the equivalent of switching away from a branch), GitButler:

  1. Commits any assigned changes first as a "WIP Assignments" commit to preserve them
  2. Performs a three-way merge to safely extract the worktree changes:
    • ours: Current working directory tree (cwdt)
    • theirs: Workspace base tree
    • base: Head of the branch being unapplied

From branch_removal.rs:

let cwdt = repo.create_wd_tree(0)?.id().to_gix();
let workspace_base = gix_repo.find_commit(workspace_base(self.ctx, perm.read_permission())?)?
    .tree_id()?;
let stack_head = gix_repo.find_real_tree(&stack.head_oid(&gix_repo)?, Default::default())?;

let mut merge = gix_repo.merge_trees(
    stack_head,
    cwdt,
    workspace_base,
    gix_repo.default_merge_labels(),
    merge_options,
)?;

4. Workspace State Management

GitButler uses a sophisticated workspace state management system in branch_trees.rs:

  • update_uncommited_changes(): Updates uncommitted changes when the workspace state changes
  • move_tree_between_workspaces(): Moves changes from one workspace state to another using Git's merge algorithm
  • WorkspaceState: Captures snapshots of the workspace at different points in time

5. Where This Happens

The stashing/uncommitted change handling is triggered in several key operations:

  1. Branch switching/unapplying: Via unapply_stack() operations
  2. Edit mode entry/exit: Via enter_edit_mode() and save_and_return_to_workspace()
  3. Workspace updates: Via update_workspace_commit() after any stack modifications
  4. Tree manipulations: Before operations like removing changes from commits

6. Key Advantages of This Approach

  • No traditional Git stash: GitButler doesn't use git stash but rather its own tree-based approach
  • Conflict resolution: Uses merge strategies that favor the working directory changes when conflicts occur
  • Granular control: Can handle partial file changes (hunks) and move them between different workspace states
  • Integration with virtual branches: The stashing is deeply integrated with GitButler's virtual branch system

7. Storage Locations

  • Uncommitted changes are stored as Git tree objects and references
  • Edit mode stashes: refs/gitbutler/edit-uncommitted-changes
  • Workspace metadata: .git/workspace file for previous branch information
  • Stack assignments: Managed through the virtual branches state system

This approach allows GitButler to safely preserve and restore uncommitted changes while switching between different workspace configurations, without relying on traditional Git stashing mechanisms.

Other

Carryover

Information to transfer between PRs just to not forget about these. Maybe offload into an issue one day once it's clear it won't be forgotten.

Performance Tasks

  • 🏎️ Expensive computations only for a single stack (the one that updated)
  • 🏎️ Use per-commit metadata to avoid recomputing changeset IDs for each commit, enabling processing more and more commits with the 1s compute budget.
  • 🏎️ graph-based merge-base computation would be much faster if a bitmap was used. Could be intrinsic or separate, with intrinsic certainly being better.

Not to forget

  • ⚠️Right now there is no occasion where branch-metadata would be deleted, so it's likely to go stale. Ideally we will be able to delete it as soon as it leaves our 'sphere of influence', but at the latest once the local branch doesn't exist anymore. Probably best to have a GC/cleanup step of sorts.
  • ⚠️single-branch mode currently doesn't limit itself to only show what's not reachable by extra-target
  • There might be an issue with the way it uses searches - a tip with a search might be blocked at an existing commit, discovered by a tip with a different search, and even though the thing it searches is reachable through that, it stops looking.
  • the amount of commits of remotes ahead of their local branch doesn't seem to always match git (particularly when it's a lot of them)
  • ⚠️ current implementation supports multiple workspaces in theory, but it's not tested with them as the underlying ref-metadata is still VB-toml. So before supporting this, we probably already want to have migrated away from vb.toml, to then port the ref-metadata to something that can support more workspaces (also for testing).
  • ⚠️ we probably don't correctly handle workspaces that include other workspaces.
  • ⚠️ we probably don't handle dot-repositories correctly, by merit of not really having them in mind. At least they shouldn't be in the way of handling normal remotes.

Copy link

vercel bot commented Aug 6, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
gitbutler-components Ready Ready Preview Comment Aug 17, 2025 0:58am

@vercel vercel bot temporarily deployed to Preview – gitbutler-components August 6, 2025 06:40 Inactive
Copy link

vercel bot commented Aug 6, 2025

@Byron is attempting to deploy a commit to the GitButler Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added the rust Pull requests that update Rust code label Aug 6, 2025
@Byron Byron changed the title stashing (TBD) V3 stashing Aug 6, 2025
@vercel vercel bot temporarily deployed to Preview – gitbutler-components August 8, 2025 04:58 Inactive
Byron added 7 commits August 17, 2025 14:54
@Byron Byron marked this pull request as ready for review August 17, 2025 12:57
@Byron Byron enabled auto-merge (squash) August 17, 2025 12:57
@Byron Byron merged commit eb053ce into gitbutlerapp:master Aug 17, 2025
20 of 21 checks passed
@Byron Byron deleted the next branch August 17, 2025 13:15
This was referenced Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant