Skip to content

Commit b248268

Browse files
committed
README.md
1 parent fa41755 commit b248268

File tree

1 file changed

+228
-0
lines changed

1 file changed

+228
-0
lines changed

README.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# git-subtrac: all your git submodules in one place
2+
3+
git-subtrac is a helper tool that makes it easier to keep track of your
4+
git submodule contents. It collects the entire contents of the entire
5+
history of all your submodules (recursively) into a separate git branch,
6+
which can be pushed, pulled, forked, and merged however you want.
7+
8+
## Quick start
9+
10+
git-subtrac is a git extension written in the Go language using the lovely
11+
[go-git](https://github.com/src-d/go-git) library. If you have a Go compiler
12+
set up, you can install the tool like this:
13+
14+
go install github.com/apenwarr/git-subtrac
15+
16+
This will drop the compiled program into Go's bin directory, often
17+
`$HOME/go/bin`. As long as that's in your `$PATH`, you will then be able to
18+
run commands like `git subtrac <whatever>`.
19+
20+
To collect the submodule history for all your branches into new tracking
21+
branches, just do this (from inside the git repository you want to affect):
22+
23+
git subtrac update
24+
25+
If your repository is incomplete (ie. you have old submodule links that no
26+
longer exist, because the submodules have since been rebased or something),
27+
this might give an error about missing commits. You can work around it with
28+
the `--auto-exclude` option:
29+
30+
git subtrac --auto-exclude update
31+
32+
Anyway, `git subtrac update` will generate, for each branch, a new branch
33+
with the `.trac` extension, referencing the complete history of all its
34+
submodule references. The easiest way to use this branch is as follows:
35+
36+
1. Push the tracking branch (eg. `master.trac`) into the same upstream
37+
repository (github or whatever) as the branch (eg. `master`) that it tracks.
38+
39+
2. Edit your `.gitmodules` file to change all the `url = <whatever>` lines
40+
to just `url = .` (a single dot). This tells git to always fetch the
41+
submodule contents from the same repo as your parent module.
42+
43+
End users downloading your project don't need to have git-subtrac installed,
44+
unless they make their own changes to submodules. All the `git submodule`
45+
commands continue to work exactly the same way as before.
46+
47+
## What problem are we solving?
48+
49+
git submodules have been very complicated and hard to use since the earliest
50+
days of git, and have not improved much. The complications stem from some
51+
major design issues:
52+
53+
1. The contents of submodules come from *other* git repositories, which can
54+
move around or be rebased/deleted/etc without warning, causing your
55+
own repository to no longer be usable. At the very least, using multiple
56+
upstream repositories makes it hard to fork a project and make
57+
wide-ranging changes: you have to fork every sub-repo that you change,
58+
then update .gitmodules links, tell everyone how to obtain your changes,
59+
etc. As a result, submodules end up forcing people to centralize on a
60+
single upstream repository.
61+
62+
2. Submodule links inside your git repository are links to commits, not
63+
trees. This is incredibly powerful ("this directory is provided by the
64+
contents of commit X of project P") because it acts like you copied the
65+
contents of one project into a subdir of another, but you have the complete
66+
history of the subproject still intact. Unfortunately, this power turns
67+
git's already-complicated history management (which few people understand)
68+
into something exponentially more complex. The subproject has (we hope) a
69+
history that always moves forward (commits always being added on top). And
70+
your main project has a history that moves forward. But your *link* to the
71+
subproject can move backward! You can make a new commit to your parent
72+
project that moves the subproject from commit X to an *earlier* commit X-2
73+
instead of a *later* commit X+2. As a result, all of git's normal fork,
74+
merge, revert, stash, etc algorithms are useless for submodules. And sure
75+
enough, nobody has ever updated them to work well with submodules.
76+
77+
3. Submodules are optional. An early design goal was to make it so that some
78+
people accessing a project might not have access to all the commits in all
79+
the subprojects. To make that work, all of git was designed to not mess
80+
with submodule links when you do regular operations, and to not abort when
81+
submodule links go missing. So it's super easy to make a mistake like
82+
forgetting to push you changes to a submodule when you push a change to
83+
the parent project, thus making it so nobody else can check out your
84+
parent code anymore.
85+
86+
git-subtrac can't solve all the problems created by these complexities, but
87+
it tries to narrow the set of constraints to make some of them easier. In
88+
particular, we put your submodule contents in the same repository as your
89+
parent project, eliminating all the problems from #1. And we make it easy
90+
to be sure you've pushed all your submodules correctly (just regenerate and
91+
push the trac branch), which greatly reduces #3.
92+
93+
We'll have to look somewhere else for a solution to #2. My earlier tool,
94+
[git-subtree](https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt)
95+
takes a different approach, avoiding submodule links entirely and merging
96+
submodule content directly into the parent repo. This solves all three of
97+
the problems above, but it makes submitting your changes back upstream more
98+
complex, because you have to split them out again and regenerate the
99+
original submodule history. It also gets confused when you move the linked
100+
subtrees around in your repository, because they just look like files being
101+
moved around, not the history of those files. Still, you might like
102+
git-subtree too. There are some discussions around about the pros and cons
103+
of submodules vs subtrees, such as [this pretty good one from Atlassian](https://www.atlassian.com/git/tutorials/git-subtree).
104+
105+
## Design: How does it actually work?
106+
107+
git-subtrac borrows some tricks from my earlier git-subtree project, but
108+
uses them with a more submodule-centric design.
109+
110+
The main thing you need to know is that, unlike every other kind of git
111+
object, a "submodule" reference from a git tree does *not* cause the
112+
referenced object (a commit) to be included in git object packs, or pushed
113+
along with your branch. In other words, when you add a file (blob) or
114+
subdirectory (tree) to your project, and then commit it to a branch, and
115+
then push that branch, the commit is never sent *without* also including a
116+
copy of the trees and blobs it references. But if your tree links to a
117+
commit - which is all a submodule is, a tree linking to a commit instead of
118+
to another tree or blob - then git push does *not* package up the subcommit
119+
or anything underneath it. It assumes you will push that commit yourself.
120+
And that's where all the problems come from.
121+
122+
The second thing you need to know is that if a commit (say, X) is referenced
123+
as the parent of another commit (say, Y), then when you push commit Y
124+
somewhere, it will always make sure X is there too (either by checking it
125+
already exists, or packaging it up along with Y). This is true recursively,
126+
so if you push the head of your commit history somewhere, then all the
127+
commits it was based on - and all the trees and blobs referred to by those
128+
commits - are pushed along with it. This is normal, of course; you'd be
129+
pretty surprised if that didn't work.
130+
131+
Also relevant is that a commit can have more than one parent commit. When
132+
you make a merge, your "merge commit" has at least two parents. It's not
133+
too commonly used, but a single commit can have as many parents as you want.
134+
You can merge a bunch of branches together in one shot.
135+
136+
So here's what we do: git-subtrac looks through all the trees of all the
137+
commits in your main project, and finds all the submodule links (ie. links
138+
to commit objects). Then it creates a new, parallel history, where for each
139+
commit, the "parent commits" of that commit include not just the "real"
140+
parent(s), but also the commits referred to inside the trees of that commit.
141+
In other words, if you have a commit Y that is based on X, and Y's
142+
filesystem contains submodule links to A, B, and C, then we produce a new
143+
commit Y+, which has parents X+, A, B, and C. X+ is generated the same way,
144+
by appending the submodule links referred to by X.
145+
146+
This way, when you push Y+ to a git server somewhere, it will include A, B,
147+
and C... and all their trees, files, and parents, exactly like you might
148+
have expected in the first place.
149+
150+
### Features of the subtrac history branch
151+
152+
There are a few subtleties here that make the results extra nice. First of
153+
all, the way we generate the synthetic commits is completely stable: anyone
154+
seeing a commit Y (and a copy of its entire history, including all its
155+
submodule links and their trees, blobs, and histories) can reproduce exactly
156+
the same Y+, including the modification dates, parents, and even the commit
157+
hashes.
158+
159+
But it goes even further. Someone who doesn't have commit Y, but does have
160+
commit X, can generate exactly the same X+ as the person who generates Y+
161+
from Y. (Remember that Y+ depends on X+.) If you have a subset of the
162+
history of Y, you can generate a perfect subset of the history of Y+.
163+
164+
This is really important: it means that two different people, given the same
165+
input, can always regenerate the same git-subtrac output. As long as you
166+
have all the objects, there's no danger in deleting and regenerating the
167+
subtrac branch from scratch.
168+
169+
Secondly, git-subtrac trims redundancy out of the `.trac` branch as far as
170+
possible. It only generates a new Y+ for commit Y (based on X) if the set of
171+
submodule links is different from X. If the submodules haven't changed, then
172+
commit Y+ *is* X+ (not a new commit). This minimizes the changes to your
173+
`.trac` branch across new commits, rebases, etc.
174+
175+
As a result, we get the following features for free:
176+
177+
- If two people differently extend commit X, for example by producing commit
178+
Y and commit Z on top of X, then their subtrac trees Y+ and Z+ will both
179+
have X+ as a parent. When you then merge Y and Z together (into, say,
180+
commit YZ), the resulting YZ+ commit will also look like a git merge of Y+
181+
and Z+. That merge could be produced by git just as easily as by
182+
git-subtrac. The main outcome is that you should always be able to push a
183+
newly generated `.trac` branch to upstream without using a "force push",
184+
because it should always look like a "fast forward" of the previous
185+
branch, even though it's actually been regenerated from scratch every time.
186+
187+
- Imagine that commit X of our project depends on commit N of a submodule.
188+
For some reason commit N doesn't work out very well, and in commit Y we
189+
rewind the submodule to commit M (one commit before N) while we wait for
190+
the submodule people to fix it. In our fork of the submodule, we then
191+
realize we want to apply a cherry pick of some other patch Q, so the
192+
'master' branch of our forked submodule looks like M+Q. If we push that
193+
new master branch to our real submodule repo, it would create a surprising
194+
problem: our subproject's commit Y works fine, because it links to M+Q.
195+
But if someone then wants to rewind to an earlier version of our
196+
superproject (eg. using `git bisect`) and try commit X again, it won't
197+
work! The submodule repo no longer contains N at all, because we rewound
198+
the submodule's master branch.
199+
200+
In this case, git-subtrac helps a lot. Since X+ includes N, and Y+
201+
includes M+Q, *both* sets of submodule links are included in the `.trac`
202+
branch. We end up successfully tracking "the history of history" of the
203+
submodules.
204+
205+
- Similarly, a common use of submodules is to maintain patch queues for
206+
sending upstream. That is, we pull in some version of a submodule project
207+
(a library), and start working on improving it, testing our patches with
208+
our superproject (an app) until we're sure they really work. When we're
209+
feeling confident, we want to rebase the subproject to the latest version
210+
and apply our patches on top before sending them to the subproject's
211+
maintainer. In a traditional submodule setup, we'd have to be very careful
212+
not to lose the *old* history when we rebase the master branch of the
213+
submodule. With git-subtrac, this all works exactly like you want it to:
214+
both the old and the new patch histories are available when you need them.
215+
216+
- Forking a superproject (eg. on github) now carries all its submodule
217+
history along with it, making pull requests very easy. (You can submit a
218+
pull request for the .trac branch if you want to share your submodule
219+
patches that way, I guess.)
220+
221+
# Questions, comments?
222+
223+
Please try it out! git-subtrac is fairly new, so I apologize for any bugs
224+
you might run into.
225+
226+
You can email me at [email protected]. I don't have as much free time as I
227+
used to, but I'll try to respond. If you fix a bug, please send pull
228+
requests on github.

0 commit comments

Comments
 (0)