|
17 | 17 | "- Learn the importance of documenting the commands and processes used in analysis scripts for reproducibility\n",
|
18 | 18 | "- Understand the use of comments to explain the rationale behind code and improve readability without redundancy\n",
|
19 | 19 | "- Understand how to set and organize working directories to maintain a clean and navigable project structure\n",
|
20 |
| - "- Establish and maintain consistent coding conventions to improve code readability and mtainability\n", |
| 20 | + "- Establish and maintain consistent coding conventions to improve code readability and maintainability\n", |
21 | 21 | "- Learn to run and validate scripts from start to finish to ensure completeness and reproducibility\n",
|
22 | 22 | "\n",
|
23 | 23 | "## Becoming a proficient coder\n",
|
|
26 | 26 | "\n",
|
27 | 27 | "When you create your scripts there are a number of common conventions you should consider \n",
|
28 | 28 | "\n",
|
29 |
| - "- *Commenting (#).* When R sees a # it ignores anything that comes after in until a new line is started. So you can use to add any comments for the human reader as opposed to the computer. Because the R commands are written for the computer to read, a script containing just R code is not the easiest for a human to read. For these reason we need to add comments. There is an art to commenting effectively, often less is more. Comments should not just repeat what the code is doing they should be used to explain the reasoning behind various choices and explain things that are not obvious. \n", |
| 29 | + "- *Commenting (#).* When R sees a # it ignores anything that comes after it until a new line is started. So you can use it to add any comments for the human reader as opposed to the computer. Because the R commands are written for the computer to read, a script containing just R code is not the easiest for a human to read. For these reasons we need to add comments. There is an art to commenting effectively, often less is more. Comments should not just repeat what the code is doing they should be used to explain the reasoning behind various choices and explain things that are not obvious. \n", |
30 | 30 | "\n",
|
31 | 31 | "- *Set the working directory.* Use a command so that when you reopen your script, you know where everything is.\n",
|
32 | 32 | "\n",
|
|
35 | 35 | "\n",
|
36 | 36 | "- *One script per job.* It is very tempting to just add to the end of an existing script but it is more efficient and effective to limit each script to a single task. This makes them easier to navigate but also protects from errors or bugs negatively affecting everything else downstream. \n",
|
37 | 37 | "\n",
|
38 |
| - "- *Don't hoard your workspace* It can be really tempting to save everything you have ever done, so you can trace back any mistakes. But an chaotic environment is hard and confusing to navigate. Identify what you really need to keep, a well maintained script should mean you can easily recreate your analysis and debug that way rather than save all the stages of the analysis.\n", |
| 38 | + "- *Don't hoard your workspace* It can be really tempting to save everything you have ever done, so you can trace back any mistakes. But a chaotic environment is hard and confusing to navigate. Identify what you really need to keep. A well-maintained script should mean you can easily recreate your analysis and debug it that way rather than save all the stages of the analysis.\n", |
39 | 39 | "\n",
|
40 | 40 | "- *Outline.* Use the outline feature in Rstudio to apply a consistent structure to all your scripts.\n",
|
41 | 41 | "\n",
|
42 |
| - "- *The devil is in the details* Compared to other languages R error messages can be informative, try reading them and looking for key words to indicate what the problem might be. \n", |
| 42 | + "- *The devil is in the detail* Compared to other languages R error messages can be informative; try reading them and looking for key words to indicate what the problem might be. \n", |
43 | 43 | "\n",
|
44 |
| - "- *Google it out* If you want to do something complicated chances are somebody else has tried before. Google for solutions to your problems. If there is no solution, use Stackoverflow.\n", |
| 44 | + "- *Google it out.* If you want to do something complicated, chances are somebody else has tried before. Google for solutions to your problems. If there is no solution, use Stackoverflow.\n", |
45 | 45 | "\n",
|
46 | 46 | "\n",
|
47 |
| - "- *Create a pseudocode.* Start your script by setting up the titles of your sections. Then progresively, populate the sections with subtitles and lastly, fill out your code with commands. Normally, I would add the sections: Set up, Data, Data Cleaning, Data analysis, Data plotting, and Wrap up. \n", |
| 47 | + "- *Create a pseudocode.* Start your script by setting up the titles of your sections. Then progressively, populate the sections with subtitles, and lastly, fill out your code with commands. Normally, I would add the sections: Set up, Data, Data Cleaning, Data analysis, Data plotting, and Wrap up. \n", |
48 | 48 | "\n",
|
49 | 49 | "## Structuring your data and analysis \n",
|
50 | 50 | "\n",
|
|
84 | 84 | "## Organizing your working directory\n",
|
85 | 85 | "\n",
|
86 | 86 | "Using a consistent folder structure across your projects will\n",
|
87 |
| - "help keep things organized, and will also make it easy find/file things in the\n", |
| 87 | + "help keep things organized, and will also make it easier to find/file things in the\n", |
88 | 88 | "future. This can be especially helpful when you have multiple projects. In\n",
|
89 | 89 | "general, you may create directories (folders) for **scripts**, **data**, and\n",
|
90 | 90 | "**documents**.\n",
|
91 | 91 | "\n",
|
92 |
| - " - **`data/`** Use this folder to store your raw data and intermediate\n", |
| 92 | + " - **`data/`** Use this folder to store the raw data and intermediate\n", |
93 | 93 | " datasets you may create for the need of a particular analysis. For the sake\n",
|
94 | 94 | " of transparency and [provenance](https://en.wikipedia.org/wiki/Provenance),\n",
|
95 | 95 | " you should *always* keep a copy of your raw data accessible and do as much\n",
|
|
113 | 113 | "# Performing reproducible analyses\n",
|
114 | 114 | "\n",
|
115 | 115 | "Once you are happy with your script or analysis, it is highly recommended that you run the whole script from top to bottom in one execution.\n",
|
116 |
| - "This ensures that your script is complete and that your record of what you did is accurate. It can be very easy when developing a script by copying and pasting chunks of code, to forget to record something, or to run commands out of order. This may have an effect on the final output, and it may be impossible to work out what happened or why, or you may even not be aware of the effect. For this reason, we recommend rerunning your script at the end. If something goes wrong, and you get an error or a warning you know, you've missed a step out or need to fix it. \n", |
| 116 | + "This ensures that your script is complete and that your record of what you did is accurate. It can be very easy when developing a script by copying and pasting chunks of code, to forget to record something, or to run commands out of order. This may have an effect on the final output, and it may be impossible to work out what happened or why, or you may even not be aware of the effect. For this reason, we recommend rerunning your script at the end. If something goes wrong, and you get an error or a warning, you know you've missed a step out or need to fix it. \n", |
117 | 117 | "\n",
|
118 | 118 | "Most importantly, if you detect the mistake further down the line, it will be much easier to troubleshoot because you know that your script is a genuine\n",
|
119 | 119 | "reflection of what you did to the data, rather than an incomplete record of some of the steps you ran, in some order. \n",
|
|
0 commit comments