Project Proposal: 2025 Stack Overflow Developer Survey

#ReDI #26s_dcp_data_circle #data_project

Original idea proposed by Nelson in 2024 as a student project for the Data Circle course at ReDI School of Digital Integration.

Objective

This project uses a structured, end-to-end analytical workflow to investigate global developer trends using the Stack Overflow 2025 Developer Survey dataset. It ensures statistical rigour, reproducibility, and alignment with industry benchmarks to provide actionable insights into the evolving professional landscape, identify high-value skill sets, and support data-driven career positioning.

Data Source and Scope

This project uses the 2025 Stack Overflow Developer Survey dataset, a comprehensive primary source representing the global software development ecosystem. Stack Overflow conducted a structured online survey targeting developers of all experience levels and specialisations.

Case Study: Predictive Modelling of Developer Compensation

This case study examines the complex factors that influence global developer salaries using advanced regression techniques on the 2025 Stack Overflow dataset. The primary objective is to quantify the predictive power of various indicators — age, formal education level, professional experience, developer roles, employment, work environment, geographic location, etc. — on annual compensation. The study will serve as a data-driven benchmark for students to assess their career prospects and negotiate compensation based on objective global market trends.

9‑Week Project Schedule

Week 1 — Project Kickoff & Data Acquisition

Form teams (2-3 students) and assign roles
Identify required datasets
Get raw data and check if it is complete
Set up project environment, GitHub repository, and documentations

Week 2 — Data Cleaning & Preparation

Choose required features to answer your questions
Clean and standardise raw fields
Handle missing values, duplicates, and inconsistent formats
Normalise numeric fields and parse text‑based features
Produce a first “clean dataset” for exploration

Week 3 — Exploratory Data Analysis (EDA)

Analyse distributions, outliers, and skewness
Explore relationships between features
Generate geographic visualisations
Document insights that inform feature engineering and modelling

Week 4 — Feature Engineering

Create new features (programming languages, frameworks, tools)
Finalise modelling dataset with all engineered features
Split the final dataset into training (75-80% observations) and test (25-30% observations) datasets

Week 5 — Baseline Modelling

Train baseline models (e.g., linear regression) for interpretability
Evaluate performance metrics
Identify feature gaps or data issues requiring refinement

Week 6 — Advanced Modeling & Hyperparameter Tuning

Train ensemble models (e.g., Random Forest, Gradient Boosting, CatBoost)
Tune hyperparameters using k‑fold cross‑validation
Compare models on cross‑validated performance metrics
Select top candidates for final evaluation

Week 7 — Model Evaluation & Diagnostics

Evaluate final models on the test set
Interpret model results to identify which features have the strongest impact (SHAP or similar techniques)
Create visualizations showing feature importance
Document model strengths, limitations, and interpretability findings

Week 8 — Dashboard Development & Insights Packaging

Build interactive dashboard (Streamlit)
Include maps visualisations, predictions, feature importance

Week 9 — Final Review, Documentation & Delivery

Compile full project report (methodology, results, recommendations)
Deliver model, data dictionary, and reproducibility notes
Give next‑phase recommendations (e.g., deployment, monitoring, further improvements)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
Sprint1.md		Sprint1.md
Sprint2.md		Sprint2.md
Sprint3.md		Sprint3.md
git_collaboration.md		git_collaboration.md
project_structure.md		project_structure.md
workflow.md		workflow.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Proposal: 2025 Stack Overflow Developer Survey

Objective

Data Source and Scope

Case Study: Predictive Modelling of Developer Compensation

9‑Week Project Schedule

Week 1 — Project Kickoff & Data Acquisition

Week 2 — Data Cleaning & Preparation

Week 3 — Exploratory Data Analysis (EDA)

Week 4 — Feature Engineering

Week 5 — Baseline Modelling

Week 6 — Advanced Modeling & Hyperparameter Tuning

Week 7 — Model Evaluation & Diagnostics

Week 8 — Dashboard Development & Insights Packaging

Week 9 — Final Review, Documentation & Delivery

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Project Proposal: 2025 Stack Overflow Developer Survey

Objective

Data Source and Scope

Case Study: Predictive Modelling of Developer Compensation

9‑Week Project Schedule

Week 1 — Project Kickoff & Data Acquisition

Week 2 — Data Cleaning & Preparation

Week 3 — Exploratory Data Analysis (EDA)

Week 4 — Feature Engineering

Week 5 — Baseline Modelling

Week 6 — Advanced Modeling & Hyperparameter Tuning

Week 7 — Model Evaluation & Diagnostics

Week 8 — Dashboard Development & Insights Packaging

Week 9 — Final Review, Documentation & Delivery

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages