Complex network analysis of the NPM ecosystem: from course project to master's thesis
This repository contains two phases of research on NPM dependency networks:
- Pre-Thesis (Course Project): Initial exploration with ~2,000 packages and Behavioral Risk Score (BRS) model
- Thesis (Master's Research): Full-scale analysis of the entire NPM registry using complex network theory
pre_thesis/ — Course Project (Completed)
Small-scale study (~2K packages) introducing the Behavioral Risk Score model and initial topological analysis.
Key Contributions:
- Scale-free topology identification
- Bridge nodes and betweenness centrality analysis
- Behavioral Risk Score (BRS) formulation
- Robustness simulation
➡️ View course project details
thesis/ — Master's Thesis (In Progress)
Full-scale complex network analysis of the entire NPM ecosystem.
Research Goals:
- Complete NPM registry coverage (millions of packages)
- Comprehensive complex network metrics
- Scalable graph processing infrastructure
- Advanced vulnerability propagation models
| Aspect | Pre-Thesis (Course) | Thesis (Master's) |
|---|---|---|
| Scope | ~2,000-20,000 packages | Full NPM registry (millions) |
| Data Source | Ecosyste.ms + BFS crawl | NPM registry dump |
| Metrics | Basic centrality (degree, betweenness) | Full complex network suite |
| Infrastructure | In-memory NetworkX | Distributed/disk-based processing |
| Runtime | Hours | Days/weeks |
| Focus | Risk score model (BRS) | Complex network dynamics |
cd pre_thesis/analysis
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m notebook # Open analysis.ipynbcd thesis
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
# See thesis/README.md for pipeline details- Pre-Thesis Documentation: Methodology, literature review, case studies
- Thesis Documentation: Advanced methods and full-scale analysis
This project is licensed under the MIT License.