-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This wiki is intended to introduce everyone to technology or tools that are integral to our workflow. It describes our research infrastructure in terms of three types of issues:
- Code
- organize it, write it, run it, track it
- Collaboration
- assigning tasks, sharing code, reviewing code, reporting results
- Computing
- geeky details
We organize the project as a series of tasks, so our organization of
code and data takes a task-based perspective. After writing code, we
automate its execution via make. We track our code (and the rest of
the project) using Git, a version control system. Collaboration occurs
via issue/task assignments, pull requests, and logbook entries that
share research designs and results.
Use a good text editor like SublimeText, Atom, or VSCode to write code, slides, and papers. Word processors aren’t text editors. Your text editor should, at minimum, offer you syntax highlighting, tab autocomplete, and multiple selection. We recommend VSCode, which supports Git, remote development via SSH, and a GitHub extension.
Our approach assumes that you’ll use Unix/Linux/MacOSX. Plain-text social science lives at the *nix command line. Gentzkow and Shapiro: “The command line is our means of implementing tools.” Per Janssens (2014): “the command line is: agile, augmenting, scalable, extensible, and ubiquitous.” Here are four intros to the Linux shell:
- https://ryanstutorials.net/linuxtutorial/
- http://swcarpentry.github.io/shell-novice/
- Grant McDermott’s ”Learning to love the shell” via his Data science for economists
- William E. Shotts, Jr’s ”Learning the Shell”
Getting started at the command line can be a little overwhelming, but it’s well worth it. While you can use GUI apps to interact with most of our workflow (e.g., GitHub Desktop), automation of some key parts relies on shell scripts. See logbook entry [[#entry:unixshelltips]entry:unixshelltips] for a haphazard collection of shell tips.
Beyond *nix, the rest of the research workflow is language-agnostic: it applies to everything from Stata to Julia. In fact, the task-based approach naturally facilitates using different languages for different tasks.
I have five criteria in mind when evaluating a research workflow:
- Replicability
- Can the research results be reproduced starting from the raw data?
- Portability
- If I install a fresh copy of the project on a new computer, what are the startup costs before I can run the code?
- Modularity
- Can a coauthor work on a task using the provided inputs without having to look upstream at the code that produced those inputs?
- Dependencies
- In the event of a data update, how do you know which pieces of code need to be run (and in what order)?
- History
- If results have changed, can I discern the relevant code changes and their authors?
After reading the rest of this wiki, you should be able to say how our workflow answers each of these questions.
If you just joined our team, welcome. A few suggestions about getting started:
- Be prepared to get stumped and make mistakes. Ask the coauthors and RAs for help. We’ve all been there. Do not struggle alone in silence.
- An existing project is a trove of material demonstrating how we work in practice. Ask a coauthor to add you to a repository so that you can read through others’ pull request reviews.
- Some RAs think mastering command-line Git is better than using a GUI app: “using the command line for Git forces you to know and understand the Git commands you issue.”
- An existing project is a trove of material demonstrating how we work in practice. Read existing Makefiles to learn how Make works.
- ChatGPT is very good at both explaining shell scripts and composing new ones based on precise instructions. ChatGPT is a tool that you should leverage. You are responsible for your outputs.
- Try to apply our computing tools and workflow to research substance
that you’ve already mastered. Were you an RA? Did you write a thesis?
Take that data and code and get it on GitHub, take your Word document
and write it in LaTeX, write a Makefile to replace your
master.R, and so forth. - We hope that one of your first assigned tasks will be “refactoring” code (improving code without creating new functionality). Because the existing code already has valid inputs and outputs, this assignment will provide a complete description of the pre-requisites and a clear benchmark by which your work will be evaluated. Look for the “refactoring/rewriting” label on GitHub issues.
- We don’t do pair programming, but you should aim to work on code live in front of other team members during your early weeks. They’ll notice shortcuts and tools you’re failing to deploy as you work.
Adapted by Levi Crews from the project template developed by Jonathan Dingel.