- Clone project with Git and install dependencies (see REQUIREMENTS.md). For convenience, a VirtualBox image has been provided that automates this step (user is
fse2024, password ispass). Note that Neo4j downloads as an AppImage on Linux, so you will need to runchmod +x ./path-to-appimage.AppImage && ./path-to-appimage.AppImagein order to run Neo4j. On the VirtualBox image, the Neo4j AppImage is located in the~/Downloadsfolder and the project has been cloned into the~/pr-issue-topology-projectfolder. If using the VirtualBox image, make sure tosource venv/bin/activateto activate the virtual environment with all the Python dependencies. - Download and import the data dump into Neo4j. A copy of the data dump is available here. This can be done by clicking '+ Add' > 'File' > select the data dump, then clicking the ellipsis beside the
neo4j.dumpfile that appears below in theFilesection and selecting 'Create new DBMS from dump'. Make a note of the password you use for later. Click 'Start' beside the new DBMS to start the server. This has already been done in the VirtualBox image, and the password istesttest.
- To check that the data was imported correctly, click 'Open' beside the new DBMS. Go into the
cypher_scriptsfolder in the project and pick a Cypher query to run. Copy-paste its contents into the query box in Neo4j and click the play button.
You should be able to see various circles representing PRs and Issues. If you click on one of these circles, you will see more attributes about that particular PR or Issue.

- Create a
passwordfile in thegenerate_neo4j_imagesfolder, containing your Neo4j password. This has been done in the VirtualBox image. - To generate images of each workflow type instance, run
python -m generate_neo4j_images.generate_from_neo4j --cypher=cypher_scripts/[query].cypher --name=[name]from the root of the project folder. The Neo4j DBMS should be running before running this script. Navigate togenerate_neo4j_images/images/[name you specified]/to view the images. - To generate the interactive HTML pages, run
./generate_all_interactive.shfrom the root of the project folder. This builds the visualizations themselves and the per-project explorer pages that were shown to developers in user interviews. Open theinteractive_html/folder in a browser and follow the link to one of the project pages to view the interactive explorer tool. - To run any of the one-off data scripts (e.g. to generate the figures in the paper), you will need to download
data.zipand unzip it into thedata/directory of the project, and downloadraw_data.zipand unzip it into theraw_data/directory. Then follow the documentation in docs/Statistics-Scripts.md. The terminal output will return the appropriate result (e.g.python -m data_scripts.total_countrun from the project root should printTotal number of components: 91126). (If you're running on a single-core machine, edit any scripts that error at the linewith Pool(cpu_count() // 2) as ptowith Pool(cpu_count()) as p.)