Investigating Student Interaction with Competency-Based CS Education - Replication Package

This repository contains the complete replication package for the research paper "Investigating Student Interaction with Competency-Based CS Education". The package includes anonymized data, analysis pipelines, survey instruments, and all code necessary to reproduce the study's findings.

Repository Structure

├── data-processing/
│   ├── data-anonymization-pipeline.ipynb   # Data anonymization process (transparency)
│   ├── data-pipeline-paper.ipynb           # Main analysis pipeline (run this)
│   └── data/
│       ├── 00_in/                          # Input directory for raw survey data
│       ├── 01_anonymized/                  # Anonymized datasets (provided)
│       └── 02_output/                      # Generated analysis outputs
├── survey/
│   ├── pre/                                # Pre-survey UI documentation
│   │   ├── 01_CBE.pdf
│   │   ├── 02_Course.pdf
│   │   └── 03_Demographics.pdf
│   └── post/                               # Post-survey UI documentation
│       ├── 01_CBE.pdf
│       ├── 02_Tool.pdf
│       ├── 03_Course.pdf
│       ├── 04_PreSurveyCheck.pdf
│       └── 05_Demographics.pdf
├── requirements.txt                        # Python dependencies
└── README.md                               # This file

Prerequisites

Python 3.13
Jupyter Notebook or JupyterLab

Installation

Download the repository and the input data files

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install required Python packages:
```
pip install -r requirements.txt
```
Launch Jupyter: It is easiest to use an IDE like Visual Studio Code or JetBrains Dataspell to run the Notebooks.

Usage

Main Analysis (Recommended)

To reproduce the study's results, run the main analysis notebook:

Navigate to data-processing/
Open data-pipeline-paper.ipynb
Run all cells sequentially (Cell → Run All)

Expected runtime: ~5-10 minutes depending on your system.

Data Pipeline Overview

The main analysis notebook performs the following steps:

Data Import: Loads anonymized datasets from data/01_anonymized/
Data Cleaning: Filters, renames, and preprocesses the data
User Classification: Groups users by interaction level (No/Low/High)
Statistical Analysis: Performs descriptive and inferential statistics
Visualization: Generates publication-ready figures
Output Generation: Saves results to data/02_output/

Data Anonymization Pipeline (Optional/Transparency)

The data-anonymization-pipeline.ipynb notebook is included for full transparency and reproducibility. It demonstrates how raw survey data was anonymized. You do not need to run this notebook unless you are conducting your own experiment with new data.

To use it with your own data:

Place raw survey exports in data/00_in/
Run data-anonymization-pipeline.ipynb
Anonymized data will be generated in data/01_anonymized/

Expected Outputs

Running the main analysis notebook will generate the following files in data/02_output/:

Data Files

science_event_with_comp.csv - Science events mapped to competencies
user_mapping.csv - User interaction group classifications

Visualizations

mastery.pdf - Mastery score analysis by interaction group
exam.pdf - Exam performance analysis by interaction group
perception.pdf - Student perception survey results (Likert scales)

Data Description

Anonymized Datasets (`data/01_anonymized/`)

The repository includes the following anonymized datasets:

System/LMS Data:
- competency.csv - Competency definitions and thresholds
- competency_exercise.csv - Competency-exercise mappings
- competency_lecture_unit.csv - Competency-lecture unit mappings
- competency_user.csv - User competency progress and confidence
- exercise.csv - Exercise definitions
- learning_path.csv - Learning path interactions
- lecture.csv - Lecture definitions
- lecture_unit.csv - Lecture unit definitions
- participation.csv - Student participation data
- participant_score.csv - Exercise scores
- participant_score_exam.csv - Exam scores
- science_event.csv - User interaction events
- jhi_user.csv - User information
- tutorial_mapping.csv - Tutorial mappings
Survey Data:
- lime_survey_pre.csv - Pre-course survey responses
- lime_survey_post.csv - Post-course survey responses

Survey Documentation (`survey/`)

The survey/ directory contains PDF files showing the exact user interface that participants encountered during the surveys. These serve as documentation of the survey instruments used in the study.

Research Questions Addressed

The analysis addresses three main research questions:

Performance: How does the level of interaction with CBETool relate to student performance on assessments?
Engagement: How does the level of interaction with CBETool relate to student engagement and motivation?
Perception: How do students perceive CBETool in terms of usability and learning support?

Key Analyses Performed

Descriptive Statistics: Demographics, mastery scores, exam performance, engagement metrics
Statistical Tests: Kruskal-Wallis tests, Dunn's post-hoc tests, proportion z-tests
Visualizations: Half-violin plots, Likert scale charts, distribution analyses
Effect Sizes: Risk ratios, odds ratios, and effect size calculations

Reproducibility Notes

Results should be identical across different runs and systems
The anonymized data provided ensures consistent results
All statistical analyses include appropriate multiple comparison corrections

Technical Notes

The analysis pipeline handles missing data appropriately
User interaction groups are determined by median split of competency interactions
All visualizations are publication-ready PDF format
Statistical significance testing includes both unadjusted and adjusted p-values

Support

For questions about reproducing the analyses or understanding the data structure, please refer to the detailed documentation within the Jupyter notebooks. Each notebook contains comprehensive markdown cells explaining every step of the analysis process.

Data Availability Statement

The anonymized datasets provided in this replication package contain all information necessary to reproduce the study's findings while protecting participant privacy. Raw survey data is not included to maintain anonymity but the anonymization process is fully documented for transparency.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data-processing		data-processing
survey		survey
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating Student Interaction with Competency-Based CS Education - Replication Package

Repository Structure

Prerequisites

Installation

Usage

Main Analysis (Recommended)

Data Pipeline Overview

Data Anonymization Pipeline (Optional/Transparency)

Expected Outputs

Data Files

Visualizations

Data Description

Anonymized Datasets (`data/01_anonymized/`)

Survey Documentation (`survey/`)

Research Questions Addressed

Key Analyses Performed

Reproducibility Notes

Technical Notes

Support

Data Availability Statement

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Investigating Student Interaction with Competency-Based CS Education - Replication Package

Repository Structure

Prerequisites

Installation

Usage

Main Analysis (Recommended)

Data Pipeline Overview

Data Anonymization Pipeline (Optional/Transparency)

Expected Outputs

Data Files

Visualizations

Data Description

Anonymized Datasets (data/01_anonymized/)

Survey Documentation (survey/)

Research Questions Addressed

Key Analyses Performed

Reproducibility Notes

Technical Notes

Support

Data Availability Statement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Anonymized Datasets (`data/01_anonymized/`)

Survey Documentation (`survey/`)

Packages