|
| 1 | +# git-subtrac: all your git submodules in one place |
| 2 | + |
| 3 | +git-subtrac is a helper tool that makes it easier to keep track of your |
| 4 | +git submodule contents. It collects the entire contents of the entire |
| 5 | +history of all your submodules (recursively) into a separate git branch, |
| 6 | +which can be pushed, pulled, forked, and merged however you want. |
| 7 | + |
| 8 | +## Quick start |
| 9 | + |
| 10 | +git-subtrac is a git extension written in the Go language using the lovely |
| 11 | +[go-git](https://github.com/src-d/go-git) library. If you have a Go compiler |
| 12 | +set up, you can install the tool like this: |
| 13 | + |
| 14 | + go install github.com/apenwarr/git-subtrac |
| 15 | + |
| 16 | +This will drop the compiled program into Go's bin directory, often |
| 17 | +`$HOME/go/bin`. As long as that's in your `$PATH`, you will then be able to |
| 18 | +run commands like `git subtrac <whatever>`. |
| 19 | + |
| 20 | +To collect the submodule history for all your branches into new tracking |
| 21 | +branches, just do this (from inside the git repository you want to affect): |
| 22 | + |
| 23 | + git subtrac update |
| 24 | + |
| 25 | +If your repository is incomplete (ie. you have old submodule links that no |
| 26 | +longer exist, because the submodules have since been rebased or something), |
| 27 | +this might give an error about missing commits. You can work around it with |
| 28 | +the `--auto-exclude` option: |
| 29 | + |
| 30 | + git subtrac --auto-exclude update |
| 31 | + |
| 32 | +Anyway, `git subtrac update` will generate, for each branch, a new branch |
| 33 | +with the `.trac` extension, referencing the complete history of all its |
| 34 | +submodule references. The easiest way to use this branch is as follows: |
| 35 | + |
| 36 | +1. Push the tracking branch (eg. `master.trac`) into the same upstream |
| 37 | +repository (github or whatever) as the branch (eg. `master`) that it tracks. |
| 38 | + |
| 39 | +2. Edit your `.gitmodules` file to change all the `url = <whatever>` lines |
| 40 | +to just `url = .` (a single dot). This tells git to always fetch the |
| 41 | +submodule contents from the same repo as your parent module. |
| 42 | + |
| 43 | +End users downloading your project don't need to have git-subtrac installed, |
| 44 | +unless they make their own changes to submodules. All the `git submodule` |
| 45 | +commands continue to work exactly the same way as before. |
| 46 | + |
| 47 | +## What problem are we solving? |
| 48 | + |
| 49 | +git submodules have been very complicated and hard to use since the earliest |
| 50 | +days of git, and have not improved much. The complications stem from some |
| 51 | +major design issues: |
| 52 | + |
| 53 | +1. The contents of submodules come from *other* git repositories, which can |
| 54 | + move around or be rebased/deleted/etc without warning, causing your |
| 55 | + own repository to no longer be usable. At the very least, using multiple |
| 56 | + upstream repositories makes it hard to fork a project and make |
| 57 | + wide-ranging changes: you have to fork every sub-repo that you change, |
| 58 | + then update .gitmodules links, tell everyone how to obtain your changes, |
| 59 | + etc. As a result, submodules end up forcing people to centralize on a |
| 60 | + single upstream repository. |
| 61 | + |
| 62 | +2. Submodule links inside your git repository are links to commits, not |
| 63 | + trees. This is incredibly powerful ("this directory is provided by the |
| 64 | + contents of commit X of project P") because it acts like you copied the |
| 65 | + contents of one project into a subdir of another, but you have the complete |
| 66 | + history of the subproject still intact. Unfortunately, this power turns |
| 67 | + git's already-complicated history management (which few people understand) |
| 68 | + into something exponentially more complex. The subproject has (we hope) a |
| 69 | + history that always moves forward (commits always being added on top). And |
| 70 | + your main project has a history that moves forward. But your *link* to the |
| 71 | + subproject can move backward! You can make a new commit to your parent |
| 72 | + project that moves the subproject from commit X to an *earlier* commit X-2 |
| 73 | + instead of a *later* commit X+2. As a result, all of git's normal fork, |
| 74 | + merge, revert, stash, etc algorithms are useless for submodules. And sure |
| 75 | + enough, nobody has ever updated them to work well with submodules. |
| 76 | + |
| 77 | +3. Submodules are optional. An early design goal was to make it so that some |
| 78 | + people accessing a project might not have access to all the commits in all |
| 79 | + the subprojects. To make that work, all of git was designed to not mess |
| 80 | + with submodule links when you do regular operations, and to not abort when |
| 81 | + submodule links go missing. So it's super easy to make a mistake like |
| 82 | + forgetting to push you changes to a submodule when you push a change to |
| 83 | + the parent project, thus making it so nobody else can check out your |
| 84 | + parent code anymore. |
| 85 | + |
| 86 | +git-subtrac can't solve all the problems created by these complexities, but |
| 87 | +it tries to narrow the set of constraints to make some of them easier. In |
| 88 | +particular, we put your submodule contents in the same repository as your |
| 89 | +parent project, eliminating all the problems from #1. And we make it easy |
| 90 | +to be sure you've pushed all your submodules correctly (just regenerate and |
| 91 | +push the trac branch), which greatly reduces #3. |
| 92 | + |
| 93 | +We'll have to look somewhere else for a solution to #2. My earlier tool, |
| 94 | +[git-subtree](https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt) |
| 95 | +takes a different approach, avoiding submodule links entirely and merging |
| 96 | +submodule content directly into the parent repo. This solves all three of |
| 97 | +the problems above, but it makes submitting your changes back upstream more |
| 98 | +complex, because you have to split them out again and regenerate the |
| 99 | +original submodule history. It also gets confused when you move the linked |
| 100 | +subtrees around in your repository, because they just look like files being |
| 101 | +moved around, not the history of those files. Still, you might like |
| 102 | +git-subtree too. There are some discussions around about the pros and cons |
| 103 | +of submodules vs subtrees, such as [this pretty good one from Atlassian](https://www.atlassian.com/git/tutorials/git-subtree). |
| 104 | + |
| 105 | +## Design: How does it actually work? |
| 106 | + |
| 107 | +git-subtrac borrows some tricks from my earlier git-subtree project, but |
| 108 | +uses them with a more submodule-centric design. |
| 109 | + |
| 110 | +The main thing you need to know is that, unlike every other kind of git |
| 111 | +object, a "submodule" reference from a git tree does *not* cause the |
| 112 | +referenced object (a commit) to be included in git object packs, or pushed |
| 113 | +along with your branch. In other words, when you add a file (blob) or |
| 114 | +subdirectory (tree) to your project, and then commit it to a branch, and |
| 115 | +then push that branch, the commit is never sent *without* also including a |
| 116 | +copy of the trees and blobs it references. But if your tree links to a |
| 117 | +commit - which is all a submodule is, a tree linking to a commit instead of |
| 118 | +to another tree or blob - then git push does *not* package up the subcommit |
| 119 | +or anything underneath it. It assumes you will push that commit yourself. |
| 120 | +And that's where all the problems come from. |
| 121 | + |
| 122 | +The second thing you need to know is that if a commit (say, X) is referenced |
| 123 | +as the parent of another commit (say, Y), then when you push commit Y |
| 124 | +somewhere, it will always make sure X is there too (either by checking it |
| 125 | +already exists, or packaging it up along with Y). This is true recursively, |
| 126 | +so if you push the head of your commit history somewhere, then all the |
| 127 | +commits it was based on - and all the trees and blobs referred to by those |
| 128 | +commits - are pushed along with it. This is normal, of course; you'd be |
| 129 | +pretty surprised if that didn't work. |
| 130 | + |
| 131 | +Also relevant is that a commit can have more than one parent commit. When |
| 132 | +you make a merge, your "merge commit" has at least two parents. It's not |
| 133 | +too commonly used, but a single commit can have as many parents as you want. |
| 134 | +You can merge a bunch of branches together in one shot. |
| 135 | + |
| 136 | +So here's what we do: git-subtrac looks through all the trees of all the |
| 137 | +commits in your main project, and finds all the submodule links (ie. links |
| 138 | +to commit objects). Then it creates a new, parallel history, where for each |
| 139 | +commit, the "parent commits" of that commit include not just the "real" |
| 140 | +parent(s), but also the commits referred to inside the trees of that commit. |
| 141 | +In other words, if you have a commit Y that is based on X, and Y's |
| 142 | +filesystem contains submodule links to A, B, and C, then we produce a new |
| 143 | +commit Y+, which has parents X+, A, B, and C. X+ is generated the same way, |
| 144 | +by appending the submodule links referred to by X. |
| 145 | + |
| 146 | +This way, when you push Y+ to a git server somewhere, it will include A, B, |
| 147 | +and C... and all their trees, files, and parents, exactly like you might |
| 148 | +have expected in the first place. |
| 149 | + |
| 150 | +### Features of the subtrac history branch |
| 151 | + |
| 152 | +There are a few subtleties here that make the results extra nice. First of |
| 153 | +all, the way we generate the synthetic commits is completely stable: anyone |
| 154 | +seeing a commit Y (and a copy of its entire history, including all its |
| 155 | +submodule links and their trees, blobs, and histories) can reproduce exactly |
| 156 | +the same Y+, including the modification dates, parents, and even the commit |
| 157 | +hashes. |
| 158 | + |
| 159 | +But it goes even further. Someone who doesn't have commit Y, but does have |
| 160 | +commit X, can generate exactly the same X+ as the person who generates Y+ |
| 161 | +from Y. (Remember that Y+ depends on X+.) If you have a subset of the |
| 162 | +history of Y, you can generate a perfect subset of the history of Y+. |
| 163 | + |
| 164 | +This is really important: it means that two different people, given the same |
| 165 | +input, can always regenerate the same git-subtrac output. As long as you |
| 166 | +have all the objects, there's no danger in deleting and regenerating the |
| 167 | +subtrac branch from scratch. |
| 168 | + |
| 169 | +Secondly, git-subtrac trims redundancy out of the `.trac` branch as far as |
| 170 | +possible. It only generates a new Y+ for commit Y (based on X) if the set of |
| 171 | +submodule links is different from X. If the submodules haven't changed, then |
| 172 | +commit Y+ *is* X+ (not a new commit). This minimizes the changes to your |
| 173 | +`.trac` branch across new commits, rebases, etc. |
| 174 | + |
| 175 | +As a result, we get the following features for free: |
| 176 | + |
| 177 | +- If two people differently extend commit X, for example by producing commit |
| 178 | + Y and commit Z on top of X, then their subtrac trees Y+ and Z+ will both |
| 179 | + have X+ as a parent. When you then merge Y and Z together (into, say, |
| 180 | + commit YZ), the resulting YZ+ commit will also look like a git merge of Y+ |
| 181 | + and Z+. That merge could be produced by git just as easily as by |
| 182 | + git-subtrac. The main outcome is that you should always be able to push a |
| 183 | + newly generated `.trac` branch to upstream without using a "force push", |
| 184 | + because it should always look like a "fast forward" of the previous |
| 185 | + branch, even though it's actually been regenerated from scratch every time. |
| 186 | + |
| 187 | +- Imagine that commit X of our project depends on commit N of a submodule. |
| 188 | + For some reason commit N doesn't work out very well, and in commit Y we |
| 189 | + rewind the submodule to commit M (one commit before N) while we wait for |
| 190 | + the submodule people to fix it. In our fork of the submodule, we then |
| 191 | + realize we want to apply a cherry pick of some other patch Q, so the |
| 192 | + 'master' branch of our forked submodule looks like M+Q. If we push that |
| 193 | + new master branch to our real submodule repo, it would create a surprising |
| 194 | + problem: our subproject's commit Y works fine, because it links to M+Q. |
| 195 | + But if someone then wants to rewind to an earlier version of our |
| 196 | + superproject (eg. using `git bisect`) and try commit X again, it won't |
| 197 | + work! The submodule repo no longer contains N at all, because we rewound |
| 198 | + the submodule's master branch. |
| 199 | + |
| 200 | + In this case, git-subtrac helps a lot. Since X+ includes N, and Y+ |
| 201 | + includes M+Q, *both* sets of submodule links are included in the `.trac` |
| 202 | + branch. We end up successfully tracking "the history of history" of the |
| 203 | + submodules. |
| 204 | + |
| 205 | +- Similarly, a common use of submodules is to maintain patch queues for |
| 206 | + sending upstream. That is, we pull in some version of a submodule project |
| 207 | + (a library), and start working on improving it, testing our patches with |
| 208 | + our superproject (an app) until we're sure they really work. When we're |
| 209 | + feeling confident, we want to rebase the subproject to the latest version |
| 210 | + and apply our patches on top before sending them to the subproject's |
| 211 | + maintainer. In a traditional submodule setup, we'd have to be very careful |
| 212 | + not to lose the *old* history when we rebase the master branch of the |
| 213 | + submodule. With git-subtrac, this all works exactly like you want it to: |
| 214 | + both the old and the new patch histories are available when you need them. |
| 215 | + |
| 216 | +- Forking a superproject (eg. on github) now carries all its submodule |
| 217 | + history along with it, making pull requests very easy. (You can submit a |
| 218 | + pull request for the .trac branch if you want to share your submodule |
| 219 | + patches that way, I guess.) |
| 220 | + |
| 221 | +# Questions, comments? |
| 222 | + |
| 223 | +Please try it out! git-subtrac is fairly new, so I apologize for any bugs |
| 224 | +you might run into. |
| 225 | + |
| 226 | +You can email me at [email protected]. I don't have as much free time as I |
| 227 | +used to, but I'll try to respond. If you fix a bug, please send pull |
| 228 | +requests on github. |
0 commit comments