|
1 | | -# Contributing to data.table |
| 1 | +Filing issues |
| 2 | +------------- |
2 | 3 |
|
3 | | -Contribution guidelines are on [**Contributing page**](https://github.com/Rdatatable/data.table/wiki/Contributing) on the project's wiki: |
| 4 | +- Please read and follow all the instructions at **[Support](https://github.com/Rdatatable/data.table/wiki/Support)** before filing; e.g. **check [NEWS](https://github.com/Rdatatable/data.table/blob/master/NEWS.md)** first and **search existing [Issues](https://github.com/Rdatatable/data.table/issues)**. |
| 5 | +- One issue for one purpose. Don't report more than one bug or request several features in the same issue. |
| 6 | +- Feel free to add reactions to existing issues that are important to you. We monitor this and it helps us prioritize where to devote our efforts! We expect [this issue](https://github.com/Rdatatable/data.table/issues/3189) to be evergreen. |
4 | 7 |
|
5 | | -* so it can be revised easily in one place by anyone |
6 | | -* so that revsions do not trigger Continuous Integration checks |
| 8 | +**Filing issues is contributing. Thank you!** |
| 9 | + |
| 10 | +Pull Requests (PRs) |
| 11 | +------------------- |
| 12 | + |
| 13 | +If you are not fixing an open issue and you are confident, you do not need to file a new issue before submitting the PR. It's easier for us to accept and merge a self-contained PR with everything in one place. If discussion is needed, it can be done in the PR. However, **the PR's status must be passing tests before we will start to look at it**. So, before you spend the time getting to that stage, it may save you time to create an issue first and start a discussion to see if your idea would be accepted in principle. If you are going to spend more than a day on the PR, creating an issue first lets other people know you are working on it to save duplicating effort. |
| 14 | + |
| 15 | +1. Every new feature or bug fix must have one or more new tests in [`inst/tests/tests.Rraw`](https://github.com/Rdatatable/data.table/blob/master/inst/tests/tests.Rraw); see below for a bit more on how this file works. You _must please_ check that the tests fail _without_ the fix, since the build system only checks that the new test passes _with_ the fix, which is not sufficient. For example, run your new test with the current DEV version and verify that it actually fails. The test's comment should contain `#<issue number>` so we can quickly find the issue in future. |
| 16 | +2. Unless the change is trivial (e.g. typo fix) there must be a new entry in [NEWS](https://github.com/Rdatatable/data.table/blob/master/NEWS.md). Please use the name of the user-visible function at the start to aid users quickly scanning the news item, explain the feature/bug, and thank issue/pr contributors by name. Follow the prevailing style at the top of the file; e.g. "fread with X in Y circumstance would error/segfault, [#123](issue link). Thanks to _(them)_ for reporting and _(me)_ for fixing". These are the release notes that others quickly skim and search so please use relevant helpful keywords with that in mind. If the problem was an error/warning/message, please include the error/warning/message in the news item so that folk searching for it will have a better chance of finding the news item, and to make the news item more specific. Bug fixes are under the bug fixes section heading so there is no need to include words such as "is fixed" in the first sentence because that is implicit. Please link to the issue(s) not to the PR (unless there is just a PR and no issue); if folk are interested in the detail of the fix they can get to the PR from the issue. Again: please follow the prevailing style of news items. Doing so makes it much easier and faster to review and merge. |
| 17 | +3. Please create the PR against the `master` branch. You can do that by forking the repository, creating a new branch for your feature/bugfix in the forked project, and then using that as a base for your pull requests. After your first successful merged PR you will very likely be invited to be a project member. This will allow you to create your next branch directly in the project which is easier and more convenient than forking, both for you and for Rdatatable's maintainers. Working on _branches_ on this (Rdatatable) project will _not_ affect the core code, so you can feel free to experiment as a project member; the core code is on the `master` branch, and only Matt & Arun can push/merge code there. Remember to do `git pull upstream your_branch` (where `upstream` is the name of the remote for `Rdatatable/data.table` seen in `git remote -v`) each time you want to add something; this will keep your local branch up to date with remote, in case anyone makes commits you don't yet have locally. This will reduce the number of merge conflicts you will need to deal with. Do not use `git rebase` on a branch where other users are pushing. |
| 18 | +4. Just one feature/bugfix per PR please. Small changes are easier to review and accept than big sweeping changes. Sometimes big sweeping changes are needed and we just have to discuss those case by case. |
| 19 | +5. You do not need to separate formatting-only changes. Just make the format changes in your PR. When the PR is passing tests and we look at the PR's unified diff, we will subtract the formatting-only changes and make those to master directly for you. That will reduce your PR to logic changes only so that it can be more easily reviewed now and easier to look back on in future. |
| 20 | +6. GitHub enables us to squash commits together when merging, so you don't have to squash yourself. |
| 21 | +7. Your pull request's description is the place to put any benchmark results and be a bit more verbose than the entry in the NEWS file, if you think that's appropriate. Include text "Closes #ISSUE_NUM" (case insensitive but the space must be present) for GitHub to link and close the corresponding issue when the PR is merged. If multiple issues are being closed, add that many "Closes #ISSUE" lines. |
| 22 | +8. Ensure that all tests pass by typing `test.data.table()` after installing your branch. It's also better to `R CMD check --as-cran` against your branch source package archive `.tar.gz` file. You may want to add `--no-manual`, `--no-build-vignettes` or `--ignore-vignettes` (R 3.3.0+) options to reduce dependencies required to perform check. PRs with failed tests can't be merged and it is hard to debug every PR and explain why it fails and how to fix it. The lesser the feedback required, the faster it is likely to be merged. Matt has added his dev cycle script [here](https://github.com/Rdatatable/data.table/blob/master/.dev/cc.R) and Pasha has added a Makefile [here](https://github.com/Rdatatable/data.table/blob/master/Makefile). The `.rprofile` setup for how to use Matt's dev cycle script can be found [here](https://github.com/Rdatatable/data.table/issues/5131). |
| 23 | + |
| 24 | +Example of a good pull request: [PR#2332](https://github.com/Rdatatable/data.table/pull/2332). It has a NEWS entry. It passed existing tests and added a new one. One test was removed but the PR description clearly explained why upfront (without us having to ask). Benchmark results were included, which made the need for the change compelling. We didn't need to run anything ourselves. Everything was including in one PR in one place. In short, it was a pleasure to review and merge. |
| 25 | + |
| 26 | +### Coding Style |
| 27 | + |
| 28 | +A few minor points of style that you should adhere to in your PR: |
| 29 | + |
| 30 | +- Use `=` for assignment (not `<-`); see [here](https://github.com/Rdatatable/data.table/pull/3582#discussion_r287075480), [here](https://github.com/Rdatatable/data.table/issues/3590#issuecomment-495776200), [here](https://stackoverflow.com/questions/1741820/what-are-the-differences-between-and-in-r#comment14293879_1742591) and [here](https://twitter.com/kdpsinghlab/status/1044568690346393606) |
| 31 | +- Spacing |
| 32 | + + 2-space indentation |
| 33 | + + No trailing white space |
| 34 | + + In tests use long lines and vertical alignment to easily spot small differences |
| 35 | + + Argument spacing style: `fun(arg1=value1, arg2=val2, ...)` |
| 36 | + + Add a whitespace between `if` and opening bracket, also before opening curly bracket: `if (condition) {` |
| 37 | +- Use `L` suffix for integer; i.e. `x[1L]` not `x[1]` (to save coercion) |
| 38 | +- Use `stop(domain=NA, gettextf(fmt, arg1, arg2, ..., domain="R-data.table"))` not `stop(paste(...))` or `stop(sprintf(...))` to facilitate translation as per [Writing R Extensions#1.7 point 2](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Diagnostic-messages) |
| 39 | + |
| 40 | +### Translations |
| 41 | + |
| 42 | +`data.table` offers some translations for non-native English speakers (currently, it's limited to just Mandarin). For development, there's a few minor points: |
| 43 | + |
| 44 | + - You _don't_ need to update the `po` files in your PR. We update these in one PR before each release. This makes it much easier for translators to update the messages in one go, rather than greenlighting each PR. |
| 45 | + - For R code, when writing error messages (or `warning`s, or `message`s), use a single string where possible. Bad: `stop("This is ", "an error")` vs. Good: `stop("This is an error")`. If you need to construct the error message from several parts, use `gettextf` (and rarely, `ngettext`). Bad: `stop("Error in column ", j, ". Please fix."))` vs. Good: `stop(domain=NA, gettextf("Error in column %d. Please fix.", j, domain="R-data.table"))`. Note the usage of `domain` in both `stop()` and `gettextf()`. More uncommonly, when using `cat()`, always use `gettextf`. |
| 46 | + - For C code, any `char` array that will be shown to users should be wrapped in `_()` (which marks the array for translation). Bad: `error("Error in column %d. Please fix.", j);` vs. Good: `error(_("Error in column %d. Please fix."), j);`. |
| 47 | + |
| 48 | +### Testing |
| 49 | + |
| 50 | +`data.table` uses a series of unit tests to exhibit code that is expected to work. These are primarily stored in [`inst/tests/tests.Rraw`](https://github.com/Rdatatable/data.table/blob/master/inst/tests/tests.Rraw). They come primarily from two places -- when new features are implemented, the author constructs minimal examples demonstrating the expected common usage of said feature, including expected failures/invalid use cases (e.g., the [initial assay of `fwrite` includes 28 tests](https://github.com/Rdatatable/data.table/blob/master/inst/tests/tests.Rraw#L9123-L9245)). Second, when kind users such as yourself happen upon some aberrant behavior in their everyday use of `data.table` (typically, some edge case that slipped through the cracks in the coding logic of the original author). We try to be thorough -- for example there were initially [141 tests of `split.data.table`](https://github.com/Rdatatable/data.table/blob/master/inst/tests/tests.Rraw#L8493-L8952), and that number has since grown! |
| 51 | + |
| 52 | +When you file a pull request, you should add some tests to this file with this in mind -- for new features, try to cover possible use cases extensively (we use [Codecov](https://codecov.io/gh/Rdatatable/data.table) to make it a bit easier to see how well you've done to minimally cover any new code you've added); for bug fixes, include a minimal version of the problem you've identified and write a test to ensure that your fix indeed works, and thereby guarantee that your fix continues to work as the codebase is further modified in the future. We encourage you to scroll around in `tests.Rraw` a bit to get a feel for the types of examples that are being created, and how bugs are tested/features evaluated. |
| 53 | + |
| 54 | +What numbers should be used for new tests? Numbers should be new relative to current master at the time of your PR. If another PR is merged before yours, then there may be a conflict, but that is no problem, as [Matt will fix the test numbers when he merges your PR](https://github.com/Rdatatable/data.table/pull/4731#issuecomment-768858134). |
| 55 | + |
| 56 | +#### Using `test` |
| 57 | + |
| 58 | +See [`test` function manual](https://rdatatable.gitlab.io/data.table/reference/test.html). |
| 59 | + |
| 60 | +**References:** If you are not sure how to issue a PR, but would like to contribute, these links should help get you started: |
| 61 | + |
| 62 | +1. **[How to Github: Fork, Branch, Track, Squash and Pull request](https://gun.io/blog/how-to-github-fork-branch-and-pull-request/)**. |
| 63 | +2. **[Squashing Github pull requests into a single commit](http://eli.thegreenplace.net/2014/02/19/squashing-github-pull-requests-into-a-single-commit)**. |
| 64 | +3. **[Github help](https://help.github.com/articles/using-pull-requests/)** - you'll need the *fork and pull* model. |
| 65 | + |
| 66 | +Minimal first time PR |
| 67 | +--------------------- |
| 68 | + |
| 69 | +``` |
| 70 | +$ cd /tmp # or anywhere safe to play |
| 71 | +$ git config --global core.autocrlf false # Windows-only preserve \n in test data |
| 72 | +$ git clone https://github.com/Rdatatable/data.table.git |
| 73 | +$ cd data.table |
| 74 | +$ R CMD build . |
| 75 | +$ R CMD check data.table_1.11.5.tar.gz |
| 76 | +... |
| 77 | +Status: OK |
| 78 | +``` |
| 79 | +Congratulations - you've just compiled and tested the very latest version of data.table in development. Everything looks good. Now make your changes. Using an editor of your choice, edit the appropriate `.R`, `.md`, `NEWS` and `tests.Rraw` files. Test your changes : |
| 80 | +``` |
| 81 | +$ R CMD build . |
| 82 | +$ R CMD check data.table_1.11.5.tar.gz |
| 83 | +``` |
| 84 | +or if your OS supports makefile |
| 85 | +``` |
| 86 | +$ make build && make check |
| 87 | +``` |
| 88 | +Fix the problems and repeat the `build` and `check` steps until you get `Status: OK`. |
| 89 | +Now commit the change and push. Since this is a first time PR and you're not a project member, this step should automatically ask you if you wish to fork the project. Say 'yes'. If that's not the case, please edit this wiki page to show what exactly happens for non project members. |
| 90 | +``` |
| 91 | +$ git commit -am "Added/fixed something in somewhere" |
| 92 | +$ git push |
| 93 | +``` |
| 94 | +After your first successful non-trivial PR you'll likely be invited to be a project member. When you're a member, before making your changes, you would create a branch first : |
| 95 | +``` |
| 96 | +$ git checkout -b my_new_branch |
| 97 | +``` |
| 98 | +and then the `commit` and `push` shown above would push to the branch in the main project. The next time you refresh the GitHub page in your browser, a button appears which you can click to create the PR from the branch. And that's all there is to it. |
| 99 | + |
| 100 | +#### Translations |
| 101 | + |
| 102 | +`data.table` offers some translations so that our users can get feedback (errors, warnings, verbose messages) in their native language. Currently we only support Mandarin Chinese. |
| 103 | + |
| 104 | +The data for these translations lives in the `po` folder. You do not need to make any changes here for your PR -- translations are updated in a batch before each CRAN release. |
| 105 | + |
| 106 | +If you are writing R code, please avoid breaking up strings into multiple parts (i.e., wide lines are OK for error messages). |
| 107 | + |
| 108 | +If you are writing C code, please wrap your message strings (`char` arrays) with `_()`, e.g. `error(_("step %d failed"), ii)` instead of `error("step %d failed", ii)`. Common functions for this are `error`, `warning`, and `Rprintf`. |
| 109 | + |
| 110 | +data.table developer utilities |
| 111 | +------------------------------ |
| 112 | + |
| 113 | +There are few utilities that some of regular `data.table` contributors are using. They are very helpful to speed up testing new changes, and to improve productivity in general. You can read more in [.dev/README.md](https://github.com/Rdatatable/data.table/blob/master/.dev/README.md). |
| 114 | + |
| 115 | +(advanced) Tips for your dev environment |
| 116 | +----------------------------- |
| 117 | + |
| 118 | +### `pre-commit` hook |
| 119 | + |
| 120 | +[`git` hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) are a great way to try and catch common code linting-style problems on your local machine before pushing to GitHub. |
| 121 | + |
| 122 | +The sample pre-commit hook will already do some useful checks for you, e.g. making sure there are no non-ASCII filenames and checking if there are any `git` conflict markers (like `<<<<<<<`) left over in your code. |
| 123 | + |
| 124 | +This can be turned on with `mv .git/hooks/pre-commit.sample .git/hooks/pre-commit` from the `data.table` top-level directory on your clone. |
| 125 | + |
| 126 | +Other helpful tests for common mistakes: |
| 127 | + |
| 128 | +- `grep -Fr "browser()" .` - remove `browser()` calls that were inserted during debugging |
| 129 | + |
| 130 | +We may eventually implement some [pre-receive hooks](https://help.github.com/en/enterprise/2.15/admin/developer-workflow/creating-a-pre-receive-hook-script) to formalize some of these requirements. |
0 commit comments