refactor:improve checkpoint and ensure gc to improve disk space #4494

ijin · 2025-06-10T11:45:05Z

Related GitHub Issue

Closes #3391
Closes #3348
Closes #3080
Closes #3695

Description

Builds upon #3695

Discussion: #3695 (comment)

Garbage collection and perf improvements
Upgrade to git 2.49.0 (Windows only) for additional benefits

Test Procedure

Unit tests added
Sanity test on local

Type of Change

🐛 Bug Fix: Non-breaking change that fixes an issue.
✨ New Feature: Non-breaking change that adds functionality.
💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
📚 Documentation: Updates to documentation files.
⚙️ Build/CI: Changes to the build process or CI configuration.
🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

Important

Refactor ShadowCheckpointService to enhance garbage collection and checkpoint handling, improving disk space management and performance.

Behavior:
- Refactor ShadowCheckpointService to improve garbage collection and disk space management.
- Introduce tryRepack() to handle repository repacking with platform-specific commands.
- Implement periodic garbage collection in saveCheckpoint() and after branch deletion in deleteBranch().
- Cache nested git directory paths in findAndCacheNestedGitRepoPaths() to avoid repeated scans.
- Modify stageAll() to use git status for precise file staging.
Tests:
- Add unit tests in ShadowCheckpointService.test.ts to verify new garbage collection and checkpoint behaviors.
- Mock executeRipgrep to simulate nested git directory detection.
- Test handling of new, deleted, and ignored files in checkpoints.
Misc:
- Upgrade to Git 2.49.0 on Windows for enhanced repack capabilities.
- Add static logger support in ShadowCheckpointService.

^{This description was created by}^{for 2d8d2d3. You can customize this summary. It will automatically update as commits are pushed.}

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

adamhill · 2025-06-10T15:36:32Z

Awesome! I have been wondering for a long time if git prune alone could reduce our checkpoint size enough. I guess we need the tag team with repack and repo surgery with ripgrep

Do you have any idea how much this saves us in repo space 20, 30, 50%? We might be able to find people with really large repo's to be guinea pigs if you want a big stress test (unless you personally already have them and this is why you made the PR :-) ) 🦘 🤟

samhvw8 · 2025-06-10T17:07:24Z

@adamhill i think the problems is we don't need to track all of them, we just need to track what file going to change, that will be better (you can see checkpoint use git add . ) we can change it to (git add )

ijin · 2025-06-12T15:52:47Z

@adamhill Running git gc for my 2.1GB repo reduced .git from 198MB to 150MB, so about 25%. But I guess it really depends on your project. You can try it out yourself!

daniel-lxs

Hey @ijin, thank you for taking over this PR, I left some questions and suggestions.

My main concern about this new implementation is finding a way to test it correctly, it might also be a good idea to separate the garbage collection functionality from the stageAll method of tracking file changes into a separate PR so it's easier to test.

I'm curious to know what you think of this!