Skip to content

torchforge repo take 400MB+, mostly from ./.git/objects/pack which needs to be cleaned upΒ #540

@wukaixingxp

Description

@wukaixingxp

πŸ› Describe the bug

I noticed that torchforge main is taking 422MB and .git is about 416MB.. taking closer look ./.git/objects/pack has 416MB, which needs to be cleaned up. I wonder if all those whl files are needed or not..

(base) [[email protected] /data/users/kaiwu/torchforge (main)]$ du -h
0	./.git/branches
64K	./.git/hooks
4.0K	./.git/info
416M	./.git/objects/pack
0	./.git/objects/info
416M	./.git/objects
4.0K	./.git/refs/heads
0	./.git/refs/tags
4.0K	./.git/refs/remotes/origin
4.0K	./.git/refs/remotes
8.0K	./.git/refs
4.0K	./.git/logs/refs/remotes/origin
4.0K	./.git/logs/refs/remotes
4.0K	./.git/logs/refs/heads
8.0K	./.git/logs/refs
12K	./.git/logs
416M	./.git
12K	./.github/ISSUE_TEMPLATE
16K	./.github/packaging
20K	./.github/workflows
48K	./.github
80K	./.meta/mast
80K	./.meta
60K	./apps/grpo
24K	./apps/sft
84K	./apps
4.0K	./assets
16K	./docs/source/_static
56K	./docs/source/tutorial_sources/zero-to-forge
64K	./docs/source/tutorial_sources
148K	./docs/source
160K	./docs
20K	./scripts
16K	./src/forge/actors/trainer
80K	./src/forge/actors
88K	./src/forge/controller/service
144K	./src/forge/controller
76K	./src/forge/data/datasets
116K	./src/forge/data
12K	./src/forge/data_models
12K	./src/forge/losses
92K	./src/forge/observability
52K	./src/forge/util
524K	./src/forge
524K	./src
4.6M	./tests/assets/c4_test
4.6M	./tests/assets
8.0K	./tests/integration_tests/fixtures
56K	./tests/integration_tests
12K	./tests/sandbox/rl_trainer
4.0K	./tests/sandbox/toy_rl/toy_metrics
36K	./tests/sandbox/toy_rl
16K	./tests/sandbox/vllm
12K	./tests/sandbox/weight_sync
76K	./tests/sandbox
96K	./tests/unit_tests/datasets
12K	./tests/unit_tests/losses
56K	./tests/unit_tests/observability
24K	./tests/unit_tests/rl
40K	./tests/unit_tests/util
316K	./tests/unit_tests
5.1M	./tests
422M	.

(base) [[email protected] /data/users/kaiwu/torchforge (main)]$ git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  sed -n 's/^blob //p' | \
  sort -k2 -n | \
  tail -20
574f901466037707fd22844a6eb413c7cc74ce82 427839 assets/wheels/torchtitan-0.1.0-py3-none-any.whl
cc61d6db1a09082da5a15b27c4466ac2bccd80ea 431297 assets/wheels/torchtitan-0.1.0-py3-none-any.whl
4c2b5c4b353017aeacf1fc39a24f9f9027dd78e8 455093 uv.lock
6af30c4174e8a00a5933a72b3c888852284b4038 556499 main/_static/styles/bootstrap.css.map
076f70f804862fa34e0a0e11abb03cb51ba2cf2a 597002 uv.lock
bfabf06069bd848b759b33868163d169752a22b7 1075886 coding_example.ipynb
de4b034e55b1043bad1fef92b94f2ace758d869b 1489215 main/_static/vendor/fontawesome/6.5.2/js/all.min.js
3dac4f760dd949a4bbd4d0080a11ea122d90f831 4750981 tests/assets/c4_test/data.json
aff4e08058ded2dd33afcbba0778705354c341ac 19381717 apps/grpo/coding_example.ipynb
52287b4cccbe83fef51ba408ff3d1d224eade0bd 23350743 assets/wheels/monarch_no_torch-0.1.0.dev20250815-py3-none-any.whl
4d3eaeb363f51099f3a77f41cd3b9ec0a6825514 24182865 assets/ci/monarch_no_torch-0.1.0.dev20250826-py3-none-any.whl
38187ec4b5a21572f0b91b415a731320b131631b 33463995 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
34af6194016602fc7ba50873ee3954658742bfd7 34355346 assets/ci/monarch_no_torch-0.1.0.dev20251010-py3-none-any.whl
2e5d7741ab2ca321d03dc4485dcb988068634113 39439972 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
e6901b3f3e0ad924e687476880ff00b2be4b2d69 41903392 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
d5ea2de18f5856f3e7275913c4a964c819ab809a 45091866 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
146e04a275793b950f899c75c743057f00d66f15 45909255 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
b5a86f5f658532d1cfb04dd6a3d7108cdef00d67 46132887 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
a704f8703919d22146a64690c250666f013fb49b 46731916 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl
7182c00e17ad65072b3079f00cb29e710ff387e8 46909372 assets/wheels/monarch-0.0.1-cp310-cp310-linux_x86_64.whl

Versions

main branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    best practicesThings we should be doing but aren't

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions