Skip to content
Phillip Cloud edited this page Jun 21, 2013 · 4 revisions

One common hurdle that first-time contributors and/or people without much git experience do is they merge upstream commits into their own branch (myself [@cpcloud] included). This can lead to confusion on the part of the contributor and it makes the history cluttered. The best way to avoid this is to constantly keep up with upstream changes. This may sound like a pain, but 1) it will get your contribution merged faster by pandas core developers and 2) it's really not that bad once you get used to it. git makes most things very explicit and thus forces you make sure you know exactly what you are doing.

The best way to go about doing this is to do something like the following (assuming you've checked out the pandas repository):

If you've just checked it out and done nothing else with git, add the pandas upstream remote to your list of remotes

git remote add upstream git://github.com/pydata/pandas

Note: You might need to swap git:// with https:// if you're behind a firewall that doesn't allow the git:// protocol through.

Now that you have the upstream remote you can fetch (but not merge) the latest snapshot of the pandas git repository, like so:

git fetch upstream

Now, make your changes.

Finally, when you want to submit a pull request on GitHub you should run the following commands

git fetch upstream
git rebase upstream/master

This will replay your commits on top of the latest pandas git master or do nothing if nothing new was fetched.

Occasionally this will lead to merge conflicts. You must resolve these before submitting a PR. Most of the time a merge conflict can be resolved by simply looking at the git diff of the two files that are in conflict and keeping the changes that you want. Keep this in mind when you get the itch to do a PEP8 hurricane across many different files in the repository.

You are now ready to submit your PR!

Clone this wiki locally