Psychology Replication Crisis

Find the story here: cj-robinson.github.io/replication-crisis

This was an assignment for our Data Studio class at Columbia Journalism School's Data Journalism program, intended to think about how to create end-to-end data stories about topics of our choosing on deadline.

Introduction and Goals

This project explores the replication crisis that has gripped the pyschology field for the past 15 years. Attempts to replicate long-held hypotheses were constantly failing due to a confluence of factors explored in the piece. The crisis has spurred a large overhaul in the scientific community as transparency becomes increasingly important.

I've long been equally interested and skeptical of pop pyschology. I (maybe embarrasingly) remember attempting to power pose before interviews after seeing Amy Cuddy's TED Talk, and I was very intrigued reading news about the academic debates surrounding these seemingly 'revolutionary' ideas.

During college, I worked in a pyschology lab attempting to measure how 'priming' participants could change their perceptions, and as I learned more about statistics and testing in work, the more I wanted to know about the publishing system and academia.

My reporting questions were largely centered around:

Just how bad is the replication crisis and how can we measure it?
Does a failed replication have any effect?
How are pyschologists dealing with the fallout?

Tools

Scraping and analysis:

Playwright
BeautifulSoup
NopeCHA
R

Visualizations:

RAWGraphs
R/ggplot
Adobe illustrator
ai2html
Javascript/D3 for selection of HTML elements

Files

analysis/gs_scrape.ipynb - Playwright/BS scraper for Google Scholar

note that due to a nan value in my list of citations, the scraper failed midway through on the first one. instead of rerunning completely, I restarted halfway through the list since the risk of being banned by Google is high

analysis/replication_open_database.Rmd - R Markdown file with code to download the dataset and analyze the data for porting into visualization software
data/gs_scrape.csv and data/gs_scape_2.csv -- the results of the two scrapes of Google Scholar
data/replication_database.csv - original data retrieved from FORRT
data/replication_citation_df.csv - data with scraped google scholar results joined
img - my illustrator files and ai2html outputs, plus a cover photo

Data Collection and Analysis

I downloaded the base dataset using a R package built by FORRT. A different version of this dataset is also available on the website, which is what I built the scraper off of initially.
I took the original references in the dataset and used Playwright to loop through the 400 studies and search Google Scholar with each. I would then scrape the first result entry with BeautifulSoup getting title, abstract and citation.
I broke CAPTCHA's using Jonathan Soma's NopeCHA tutorial, launching Playwright with the extension and my API key
Exploratory data analysis in R and visualizations on the scatter plot of original vs replication effects, total count of replications and success rate by field.
Adobe Illustrator for responsive design in mobile/smaller device sizes
HTML/CSS/JS to make the scrollytelling experience complete, starting with one example of power posing and continuing on to the entire dataset.

Learnings

This was one project where a single line in the piece took the majority of the analysis time. I was hoping to find that successful replications were cited less often than those that were not successfuls since I had seen research showing this relationship, but sadly that was not the case.

Scraping Google was a different beast since they're fairly stingy with their scraping policies. I was first going to use a Python package for scraping the site, but soon realized it would be too risky given the amount of searches I was hoping to do plus the metadata I wanted to scrape. Instead, I manually created the scraper so that I could control CATPCHAs and avoid getting my IP banned, which was a success!

Next time, I'd like to add an interactive element where the reader is able to participate in one of the replicaiton studies and set up a backend to store user data. I'd then love to have user data displayed alongside some granular study data I found (thanks of course to the increased transparency of psychology these days)!

Thank you for visiting!

Feel free to get in touch via email or my personal website, cj-robinson.github.io.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
analysis		analysis
data		data
img		img
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Psychology Replication Crisis

Introduction and Goals

Tools

Files

Data Collection and Analysis

Learnings

Thank you for visiting!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Psychology Replication Crisis

Introduction and Goals

Tools

Files

Data Collection and Analysis

Learnings

Thank you for visiting!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages