Skip to content

Effects of content removal timeliness on illegal content persistence

Notifications You must be signed in to change notification settings

osome-iu/simsom_removal

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explore the effects of takedown delay on the persistence of illegal content

This repository contains code to reproduce the results in the paper ``Delayed takedown of illegal content on social media makes moderation ineffective''.

The model is an extension of SimSoM: A Simulator of Social Media

Overview of the repo

  1. data: contains raw & derived datasets
  2. example: contains a minimal example to start using the SimSoM model. This model was written and tested with Python>=3.6
  3. experiments: experiment configurations, results, supplementary data and .ipynb noteboooks to produce figures reported in the paper
  4. libs: contains the extended SimSoM model package that can be imported into scripts
  5. workflow:
    • rules contains scripts to run the experiments
    • scripts contains helper scripts used by rules scripts. These include functions such as network initialization, data parsing, etc.

1. Install SimSoM

We include two ways to set up the environment and install the model

1.1. Using Make (the simpliest way --- recommended)

Run make from the project directory (SimSoM)

1.2. Using Conda

We use conda, a package manager to manage the development environment. Please make sure you have conda or mamba installed on your machine

1.2.1. Create the environment with required packages: run conda env create -n simsom -f environment.yml

1.2.2. Install the SimSoM module:

  • activate virtualenv: conda activate simsom
  • run pip install -e ./libs/

2. Plot results from the paper

Run the notebooks in experiments/figures to visualize the experiment results in the paper

The results in the paper are based on averages across 10+ simulation runs. For step 3 below, the shell script is configured so that it runs one simulation for each set of experiment.

To run many simulations, change the NO_RUNS variable in the workflow/rules/run_experiment.sh. However, since running multiple simulations takes a lot of time, we suggest running many of them at once. See this workflow/rules/run_exps.smk for an inspiration on how to do this using a workflow manager, Snakemake. Also note that saving all message information as a gzip compressed file takes about 20-500 megabytes per run.

3. Reproduce results from scratch

The steps to reproduce the results from scratch, rather than using the provided results in experiments/results, are outlined below. Warning: following these steps will overwrite the content of experiments/results. All scripts are run from the project root directory, simsom_removal

3.1. Run experiments

3.1.1. Unzip the data file: unzip data/data.zip -d .

3.1.2. Automatically run all experiments to reproduce the results in the paper by running 2 commands:

  • make file executable: chmod +x workflow/rules/run_experiment.sh
  • run shell script: workflow/rules/run_experiments.sh

This script does 2 things

  • Create configuration folders for all experiments (see experiments/config for the results of this step)
  • Run the run_exps.py script with an argument to specify the experiment to run:
    • vary_tau: main results
    • vary_group_size: robustness check for varying group sizes
    • vary_illegal_probability: robustness check for varying illegal probabilities
    • vary_network_type: robustness check for varying network structures

3.2. Parse experiment data

We are interested in the prevalence of illegal content and engagement metrics such as reach and impressions. To aggregate these metrics, we need to parse the experiment verbose tracking files. To parse these files, run:

  • For reach and impressions: python workflow/scripts/read_data_engagement.py --result_path experiments/<experiment_name> --out_path experiments/results/<experiment_name>
  • For prevalence of illegal content: python read_data_illegal_count.py --result_path experiments/<experiment_name> --out_path experiments/results/<experiment_name>

See point 2 above to visualize the newly created results.

Other notes

Data description

The empirical network is created from the Replication Data for: Right and left, partisanship predicts vulnerability to misinformation, where:

  • measures.tab contains user information, i.e., one's partisanship and misinformation score.
  • anonymized-friends.json is the adjacency list.

We reconstruct the empirical network from the above 2 files, resulting in data/follower_network.gml. The steps are specified in the script to create empirical network

Step-by-step instruction and example of running SimSoM

Check out example to get started.

  • Example of the simulation and results: example/run_simulation.ipynb

About

Effects of content removal timeliness on illegal content persistence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 82.9%
  • Python 16.8%
  • Other 0.3%