Cryptojacking Detection Validation

Empirical Validation for AI-Based Cloud Cryptojacking Detection: A Systematic Literature Review

Overview

This repository contains the empirical validation code and materials for the systematic literature review paper:

AI-Based Detection of Cloud Cryptojacking: A Systematic Review of Models, Deployment Challenges, and Future Directions
Amitabh Chakravorty, Nelly Elsayed
School of Information Technology, University of Cincinnati

The validation study evaluates representative machine learning models from the reviewed literature using publicly available datasets to assess detection performance, computational cost, and reproducibility challenges in AI-based cryptojacking detection.

Key Findings

Dataset	Best Model	Accuracy	F1-Score	Training Time
DS2OS	Random Forest	99.59%	0.9959	47.95s
NSL-KDD	XGBoost	99.62%	0.9962	54.49s

Important Note: These datasets serve as proxy environments for cloud cryptojacking detection. No publicly available datasets capture genuine cloud VM, container, or Kubernetes telemetry with labeled cryptomining activity—a key finding of our systematic review.

Repository Structure

cryptojacking-validation/
├── README.md                    # This file
├── requirements.txt             # Python dependencies
├── LICENSE                      # MIT License
│
├── notebooks/                   # Jupyter notebooks (run in order)
│   ├── 1_Master.ipynb          # Environment setup & data download
│   ├── 2_Exploration.ipynb     # Dataset exploration & visualization
│   ├── 3_Preprocessing.ipynb   # Data preprocessing & SMOTE
│   └── 4_Models.ipynb          # Model training & evaluation
│
├── data/                        # Data directory (created by notebooks)
│   ├── raw/                    # Original downloaded datasets
│   └── processed/              # Preprocessed numpy arrays
│
├── models/                      # Trained model files (.pkl)
│
├── results/                     # Output files
│   ├── figures/                # Generated visualizations
│   └── metrics/                # Performance metrics (CSV)
│
├── scripts/                     # Utility scripts
│   └── utils.py                # Helper functions
│
└── docs/                        # Additional documentation
    └── METHODOLOGY.md          # Detailed methodology description

Quick Start

Option 1: Google Colab (Recommended)

Open notebooks directly in Google Colab by clicking the badge above
Run notebooks in order: 1_Master.ipynb → 2_Exploration.ipynb → 3_Preprocessing.ipynb → 4_Models.ipynb
You'll need a Kaggle account and API key for data download

Option 2: Local Environment

# Clone the repository
git clone https://github.com/AmitabhCh822/cryptojacking-validation.git
cd cryptojacking-validation

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run notebooks with Jupyter
jupyter notebook

Datasets

DS2OS (Distributed Smart Space Orchestration System)

Source: Kaggle
Samples: 357,952 records
Features: 12 (IoT device telemetry)
Classes: Normal vs. Anomalous (8 attack types)
Imbalance Ratio: 97.2% normal, 2.8% attack

NSL-KDD

Source: UNB CIC
Train Samples: 125,973
Test Samples: 22,544
Features: 41 (network traffic patterns)
Classes: Normal vs. Attack (22 attack categories → binary)

Models Evaluated

Based on the most frequently reported approaches in the systematic review:

Model	Hyperparameters
Random Forest	n_estimators=100, max_depth=20
XGBoost	n_estimators=100, max_depth=10, lr=0.1
LightGBM	n_estimators=100, max_depth=10
Decision Tree	max_depth=15, min_samples_split=5
K-Nearest Neighbors	n_neighbors=5
Gradient Boosting	n_estimators=100, max_depth=5, lr=0.1

Results Summary

Performance Metrics

Dataset	Model	Accuracy	F1-Score	Precision	Recall	Train Time (s)
DS2OS	Random Forest	99.59%	0.9959	0.9959	0.9959	47.95
DS2OS	XGBoost	99.53%	0.9953	0.9953	0.9953	88.71
DS2OS	LightGBM	97.42%	0.9742	0.9746	0.9742	18.67
DS2OS	Gradient Boosting	99.52%	0.9952	0.9952	0.9952	184.90
DS2OS	Decision Tree	99.45%	0.9945	0.9945	0.9945	3.74
DS2OS	KNN	97.47%	0.9745	0.9748	0.9745	48.01
NSL-KDD	Random Forest	99.33%	0.9933	0.9934	0.9933	48.09
NSL-KDD	XGBoost	99.62%	0.9962	0.9962	0.9962	54.49
NSL-KDD	LightGBM	99.47%	0.9947	0.9948	0.9947	24.87
NSL-KDD	Gradient Boosting	99.47%	0.9947	0.9948	0.9947	121.88
NSL-KDD	Decision Tree	99.09%	0.9909	0.9911	0.9909	1.37
NSL-KDD	KNN	96.99%	0.9694	0.9722	0.9694	72.94

Key Observations

Class Imbalance Impact: DS2OS required SMOTE (97% normal → 50/50 split) to prevent majority-class bias
Computational Trade-offs: Decision Tree fastest (1.37-3.74s) but slightly lower accuracy; Gradient Boosting slowest (122-185s) with marginal gains
Cross-Dataset Generalization: Feature space incompatibility (12 vs 41 features) prevented direct model transfer
Reproducibility Challenges: Minor preprocessing differences produced 0.5-2.5% accuracy variations

Preprocessing Pipeline

Raw Data
    │
    ├── Label Encoding (categorical → numeric)
    │
    ├── Stratified Train/Test Split (70/30)
    │
    ├── SMOTE (if imbalance ratio < 0.3)
    │
    └── StandardScaler (zero mean, unit variance)

Citation

If you use this code in your research, please cite:

@article{chakravorty2025cryptojacking,
  title={AI-Based Detection of Cloud Cryptojacking: A Systematic Review of Models, Deployment Challenges, and Future Directions},
  author={Chakravorty, Amitabh and Elsayed, Nelly},
  journal={Journal of Information Security and Applications},
  year={2025},
  publisher={Elsevier}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

University of Cincinnati CECH Impact Accelerator Grant
Canadian Institute for Cybersecurity (NSL-KDD dataset)
DS2OS dataset contributors

Contact

Amitabh Chakravorty - [email protected]
Nelly Elsayed - [email protected]

Note: This repository is part of a systematic literature review. The validation demonstrates that while high accuracy is achievable on proxy datasets, the absence of public cloud-specific cryptojacking datasets remains the field's most critical reproducibility barrier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cryptojacking Detection Validation

Overview

Key Findings

Repository Structure

Quick Start

Option 1: Google Colab (Recommended)

Option 2: Local Environment

Datasets

DS2OS (Distributed Smart Space Orchestration System)

NSL-KDD

Models Evaluated

Results Summary

Performance Metrics

Key Observations

Preprocessing Pipeline

Citation

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Python-Files		Python-Files
data		data
docs		docs
models		models
notebooks		notebooks
results		results
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

AmitabhCh822/cryptojacking-validation

Folders and files

Latest commit

History

Repository files navigation

Cryptojacking Detection Validation

Overview

Key Findings

Repository Structure

Quick Start

Option 1: Google Colab (Recommended)

Option 2: Local Environment

Datasets

DS2OS (Distributed Smart Space Orchestration System)

NSL-KDD

Models Evaluated

Results Summary

Performance Metrics

Key Observations

Preprocessing Pipeline

Citation

License

Acknowledgments

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages