Skip to content
Levi Crews edited this page Oct 29, 2024 · 10 revisions

🚧 WARNING: Under Construction

Overview

This wiki is intended to introduce everyone to technology or tools that are integral to our workflow. It describes our research infrastructure in terms of three types of issues:

Code
organize it, write it, run it, track it
Collaboration
assigning tasks, sharing code, reviewing code, reporting results
Computing
geeky details

We organize the project as a series of tasks, so our organization of code and data takes a task-based perspective. After writing code, we automate its execution via make. We track our code (and the rest of the project) using Git, a version control system. Collaboration occurs via issue/task assignments, pull requests, and logbook entries that share research designs and results.

Use a good text editor like SublimeText, Atom, or VSCode to write code, slides, and papers. Word processors aren’t text editors. Your text editor should, at minimum, offer you syntax highlighting, tab autocomplete, and multiple selection. We recommend VSCode, which supports Git, remote development via SSH, and a GitHub extension.

Our approach assumes that you’ll use Unix/Linux/MacOSX. Plain-text social science lives at the *nix command line. Gentzkow and Shapiro: “The command line is our means of implementing tools.” Per Janssens (2014): “the command line is: agile, augmenting, scalable, extensible, and ubiquitous.” Here are four intros to the Linux shell:

Getting started at the command line can be a little overwhelming, but it’s well worth it. While you can use GUI apps to interact with most of our workflow (e.g., GitHub Desktop), automation of some key parts relies on shell scripts. See logbook entry [[#entry:unixshelltips]entry:unixshelltips] for a haphazard collection of shell tips.

Beyond *nix, the rest of the research workflow is language-agnostic: it applies to everything from Stata to Julia. In fact, the task-based approach naturally facilitates using different languages for different tasks.

I have five criteria in mind when evaluating a research workflow:

Replicability
Can the research results be reproduced starting from the raw data?
Portability
If I install a fresh copy of the project on a new computer, what are the startup costs before I can run the code?
Modularity
Can a coauthor work on a task using the provided inputs without having to look upstream at the code that produced those inputs?
Dependencies
In the event of a data update, how do you know which pieces of code need to be run (and in what order)?
History
If results have changed, can I discern the relevant code changes and their authors?

After reading the rest of this wiki, you should be able to say how our workflow answers each of these questions.

Getting started

If you just joined our team, welcome. A few suggestions about getting started:

  • Be prepared to get stumped and make mistakes. Ask the coauthors and RAs for help. We’ve all been there. Do not struggle alone in silence.
  • An existing project is a trove of material demonstrating how we work in practice. Ask a coauthor to add you to a repository so that you can read through others’ pull request reviews.
  • Some RAs think mastering command-line Git is better than using a GUI app: “using the command line for Git forces you to know and understand the Git commands you issue.”
  • An existing project is a trove of material demonstrating how we work in practice. Read existing Makefiles to learn how Make works.
  • ChatGPT is very good at both explaining shell scripts and composing new ones based on precise instructions. ChatGPT is a tool that you should leverage. You are responsible for your outputs.
  • Try to apply our computing tools and workflow to research substance that you’ve already mastered. Were you an RA? Did you write a thesis? Take that data and code and get it on GitHub, take your Word document and write it in LaTeX, write a Makefile to replace your master.R, and so forth.
  • We hope that one of your first assigned tasks will be “refactoring” code (improving code without creating new functionality). Because the existing code already has valid inputs and outputs, this assignment will provide a complete description of the pre-requisites and a clear benchmark by which your work will be evaluated. Look for the “refactoring/rewriting” label on GitHub issues.
  • We don’t do pair programming, but you should aim to work on code live in front of other team members during your early weeks. They’ll notice shortcuts and tools you’re failing to deploy as you work.

Clone this wiki locally