This repository contains the complete replication package for the research paper "Investigating Student Interaction with Competency-Based CS Education". The package includes anonymized data, analysis pipelines, survey instruments, and all code necessary to reproduce the study's findings.
├── data-processing/
│ ├── data-anonymization-pipeline.ipynb # Data anonymization process (transparency)
│ ├── data-pipeline-paper.ipynb # Main analysis pipeline (run this)
│ └── data/
│ ├── 00_in/ # Input directory for raw survey data
│ ├── 01_anonymized/ # Anonymized datasets (provided)
│ └── 02_output/ # Generated analysis outputs
├── survey/
│ ├── pre/ # Pre-survey UI documentation
│ │ ├── 01_CBE.pdf
│ │ ├── 02_Course.pdf
│ │ └── 03_Demographics.pdf
│ └── post/ # Post-survey UI documentation
│ ├── 01_CBE.pdf
│ ├── 02_Tool.pdf
│ ├── 03_Course.pdf
│ ├── 04_PreSurveyCheck.pdf
│ └── 05_Demographics.pdf
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.13
- Jupyter Notebook or JupyterLab
-
Download the repository and the input data files
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate -
Install required Python packages:
pip install -r requirements.txt
-
Launch Jupyter: It is easiest to use an IDE like Visual Studio Code or JetBrains Dataspell to run the Notebooks.
To reproduce the study's results, run the main analysis notebook:
- Navigate to
data-processing/ - Open
data-pipeline-paper.ipynb - Run all cells sequentially (Cell → Run All)
Expected runtime: ~5-10 minutes depending on your system.
The main analysis notebook performs the following steps:
- Data Import: Loads anonymized datasets from
data/01_anonymized/ - Data Cleaning: Filters, renames, and preprocesses the data
- User Classification: Groups users by interaction level (No/Low/High)
- Statistical Analysis: Performs descriptive and inferential statistics
- Visualization: Generates publication-ready figures
- Output Generation: Saves results to
data/02_output/
The data-anonymization-pipeline.ipynb notebook is included for full transparency and reproducibility. It demonstrates how raw survey data was anonymized. You do not need to run this notebook unless you are conducting your own experiment with new data.
To use it with your own data:
- Place raw survey exports in
data/00_in/ - Run
data-anonymization-pipeline.ipynb - Anonymized data will be generated in
data/01_anonymized/
Running the main analysis notebook will generate the following files in data/02_output/:
science_event_with_comp.csv- Science events mapped to competenciesuser_mapping.csv- User interaction group classifications
mastery.pdf- Mastery score analysis by interaction groupexam.pdf- Exam performance analysis by interaction groupperception.pdf- Student perception survey results (Likert scales)
The repository includes the following anonymized datasets:
-
System/LMS Data:
competency.csv- Competency definitions and thresholdscompetency_exercise.csv- Competency-exercise mappingscompetency_lecture_unit.csv- Competency-lecture unit mappingscompetency_user.csv- User competency progress and confidenceexercise.csv- Exercise definitionslearning_path.csv- Learning path interactionslecture.csv- Lecture definitionslecture_unit.csv- Lecture unit definitionsparticipation.csv- Student participation dataparticipant_score.csv- Exercise scoresparticipant_score_exam.csv- Exam scoresscience_event.csv- User interaction eventsjhi_user.csv- User informationtutorial_mapping.csv- Tutorial mappings
-
Survey Data:
lime_survey_pre.csv- Pre-course survey responseslime_survey_post.csv- Post-course survey responses
The survey/ directory contains PDF files showing the exact user interface that participants encountered during the surveys. These serve as documentation of the survey instruments used in the study.
The analysis addresses three main research questions:
- Performance: How does the level of interaction with CBETool relate to student performance on assessments?
- Engagement: How does the level of interaction with CBETool relate to student engagement and motivation?
- Perception: How do students perceive CBETool in terms of usability and learning support?
- Descriptive Statistics: Demographics, mastery scores, exam performance, engagement metrics
- Statistical Tests: Kruskal-Wallis tests, Dunn's post-hoc tests, proportion z-tests
- Visualizations: Half-violin plots, Likert scale charts, distribution analyses
- Effect Sizes: Risk ratios, odds ratios, and effect size calculations
- Results should be identical across different runs and systems
- The anonymized data provided ensures consistent results
- All statistical analyses include appropriate multiple comparison corrections
- The analysis pipeline handles missing data appropriately
- User interaction groups are determined by median split of competency interactions
- All visualizations are publication-ready PDF format
- Statistical significance testing includes both unadjusted and adjusted p-values
For questions about reproducing the analyses or understanding the data structure, please refer to the detailed documentation within the Jupyter notebooks. Each notebook contains comprehensive markdown cells explaining every step of the analysis process.
The anonymized datasets provided in this replication package contain all information necessary to reproduce the study's findings while protecting participant privacy. Raw survey data is not included to maintain anonymity but the anonymization process is fully documented for transparency.