Releases: BrenoFariasdaSilva/DDoS-Detector
DDoS-Detector v11.0 — Unified Logging Infrastructure and Operational Tooling
Version v11.0 focuses on operational robustness and execution transparency, consolidating logging behavior across the project and strengthening the tooling used to execute long-running experiments locally, remotely, and through automated pipelines.
The central addition in this release is a dual-channel Logger utility, designed as a drop-in replacement for sys.stdout and sys.stderr. The logger mirrors runtime output to the terminal—preserving ANSI color sequences when a TTY is available—while simultaneously writing a sanitized, color-free log file suitable for archival, CI pipelines, and post-experiment analysis. Immediate flushing guarantees that logs remain live and inspectable during long executions and background jobs.
In addition, v11.0 completes the dataset converter integration, standardizing naming conventions and exposing the converter as a first-class Makefile target. This ensures dataset preparation is reproducible, scriptable, and fully aligned with the project’s automation and execution workflow.
Overall, v11.0 reinforces DDoS-Detector as a research-grade, automation-friendly framework, improving observability, log hygiene, and reliability for reproducible experimental runs.
Changelog
Added
-
Dual-Channel Logger Utility (
Logger.py)- Simultaneous colored terminal output and sanitized file logging
- Automatic removal of ANSI escape sequences in log files
- Immediate line flushing for real-time log monitoring
- Minimal API (
write,flush,close) for seamlesssys.stdout/sys.stderrreplacement - Safe for interactive sessions, background jobs, CI pipelines, and Makefile-driven runs
-
Makefile Integration for Dataset Conversion
- Added
dataset_converterrule for standardized dataset conversion execution - Enables consistent, reproducible dataset preparation via automation
- Added
Improved
-
Dataset Converter Standardization
- Renamed
dataset_conversor.pytodataset_converter.py - Aligned file naming with project conventions and documentation
- Renamed
-
Operational Consistency
- Unified logging behavior across scripts using a centralized utility
- Clear separation between human-readable terminal output and archival experiment logs
Fixed
- Minor inconsistencies in logging behavior across execution contexts
- Improved reliability of log flushing during long-running and background executions
v11.0 solidifies DDoS-Detector’s position as a robust, transparent, and automation-friendly research framework, enhancing the user experience during extensive experimental runs and ensuring high-quality log management for reproducible science.
Full Changelog: v10.0-dataset_converter.py...v11.0-Logger.py
DDoS-Detector v10.0 — Multi-Dataset Intelligence, Transparent Optimization, and Reproducible Pipelines
Version v10.0 marks a major architectural and experimental evolution of the DDoS-Detector framework, shifting it toward a scalable, transparent, and fully reproducible research platform for large-scale DDoS detection experiments.
This release introduces native multi-dataset support across the entire pipeline, encompassing dataset discovery, preprocessing, feature analysis, model optimization, and result consolidation. The optimization workflow was fundamentally redesigned, replacing opaque automation with explicit, manually controlled grid searches that enable deterministic behavior, fine-grained progress monitoring, and execution-time–aware best-model selection.
A strong emphasis is placed on experimental traceability and comparability. All major pipelines—including Genetic Algorithm (GA), Recursive Feature Elimination (RFE), Principal Component Analysis (PCA), stacking, and hyperparameter optimization—now automatically record hardware specifications and elapsed execution time directly into generated CSV artifacts, allowing fair cross-machine and cross-experiment analysis.
Additionally, v10.0 introduces a Multi-Format Dataset Converter, enabling automatic discovery, validation, cleaning, and conversion of datasets across multiple formats while preserving directory structure. Overall, this release transitions DDoS-Detector from a single-dataset, script-oriented tool into a robust, research-grade experimental framework suitable for long-running experiments, remote execution, and reproducible scientific analysis.
Changelog
Added
-
Multi-Format Dataset Converter (
dataset_converter.py)- Automatic discovery and conversion of ARFF, CSV, Parquet, and TXT datasets
- UTF-8 enforcement and lightweight structural cleaning
- Mirrored output directory structure with disk-space validation
- CLI interface with configurable input/output paths, formats, and verbosity
-
Full Multi-Dataset Support
- Dataset-level discovery and iteration across stacking, GA, PCA, RFE, and optimization pipelines
- Centralized
DATASETSconstants shared across modules
-
Manual Grid Search Engine
- Explicit per-combination grid search replacing
GridSearchCV - Global and per-model progress tracking using
tqdm - Per-combination timing and deterministic tie-breaking by execution time
- Explicit per-combination grid search replacing
-
Hardware & Performance Traceability
- Automatic extraction of CPU, RAM, and OS information
- Hardware metadata and elapsed-time columns added to all result CSVs
- Normalized elapsed-time handling across all pipelines
-
Dataset Descriptor Enhancements
- CSV header compatibility and feature intersection analysis
- Detection of common and extra features per file
- Class-aware downsampling utilities
- Integrated t-SNE generation and visualization for exploratory analysis
-
Robust Logging and Remote Execution
- Per-script log files with centralized log directory management
- Raw log streaming with post-run color stripping and POSIX-compliant stderr redirection
- Background execution via
nohup(Unix) andstart /B(Windows), resilient to SSH disconnections
Improved
- Modularized and clarified optimization workflows across all pipelines
- Centralized and synchronized progress-bar management
- Simplified elapsed-time tracking and verbose reporting
- Improved CSV discovery, validation, and preprocessing logic
- Standardized naming conventions and file organization
- Renamed
dataset_conversor.pytodataset_converter.py - Cleaned and updated dependency specifications
Fixed
- Corrected global progress-bar counters for accurate execution tracking
- Fixed obsolete arguments and inconsistent verbose messages
- Resolved pandas warnings related to
pd.read_csv(low_memory) - Corrected terminal-clearing typo in verbose optimization output
- Improved handling of empty or optional filename filters
v10.0 transforms DDoS-Detector into a comprehensive, scalable, and reproducible research framework, empowering users to conduct large-scale DDoS detection experiments with confidence in their results and methodologies.
Full Changelog: v9.0-hyperparameters_optimization.py...v10.0-dataset_converter.py
DDoS-Detector v9.0 — Feature Selection Consolidation, Stacking Evaluation, and Hyperparameter Optimization Overhaul
Version v9.0 marks a major architectural and experimental milestone for DDoS-Detector, transitioning the project from isolated feature-selection experiments into a fully integrated, reproducible, and scalable machine learning evaluation framework.
This release consolidates Genetic Algorithm (GA), Recursive Feature Elimination (RFE), and Principal Component Analysis (PCA) into a unified workflow and introduces a dedicated hyperparameter optimization pipeline tightly coupled to explainable feature selection. The new architecture enables systematic comparison of individual classifiers, stacking ensembles, and optimized models under strict experimental control, making the framework suitable for thesis-level studies and large-scale benchmarking.
Key highlights of v9.0 include:
-
Stacking-Based Evaluation Pipeline:
- Replacement of legacy combination logic with a dedicated
stacking.pyorchestrator - Unified evaluation of individual classifiers and stacked ensembles
- Seamless integration of GA-, RFE-, and PCA-derived feature subsets
- Replacement of legacy combination logic with a dedicated
-
Hyperparameter Optimization Framework:
- Introduction of a standalone optimization pipeline using GridSearchCV
- Optimization driven by GA-selected feature subsets
- Dataset-isolated results with multi-class weighted F1-score optimization
-
Reproducibility and Performance Improvements:
- Deterministic outputs for GA, PCA, and RFE pipelines
- Standardized CSV schemas enabling downstream automation
- Resource-aware execution with explicit control over threads, processes, and memory usage
Overall, v9.0 lays the foundation for robust experimental studies, ensuring reproducibility, traceability, and extensibility across all stages of DDoS detection model evaluation.
Changelog
Added
stacking.pyimplementing a full Stacking Classifier Evaluation Pipeline- Unified feature ingestion from GA, RFE, and PCA
- Evaluation support for:
- Individual classifiers
- Stacked ensemble classifiers
- PCA-based feature representations
- Hyperparameter Optimization module (
hyperparameter_optimization.py)- GridSearchCV-based optimization
- GA-driven feature selection
- Weighted multi-class F1-score objective
- Per-dataset result isolation
- PCA object serialization and reuse to avoid redundant computation
- Dataset metadata enrichment via
dataset_descriptor.py
Improved
- Standardized CSV output schemas across GA, PCA, RFE, stacking, and optimization pipelines
- Deterministic GA outputs consolidated into a single, reproducible CSV
- Enforced deterministic behavior in RFE by removing multiple-run support
- Modularized pipelines with clearer separation of:
- Data loading and preprocessing
- Feature subset preparation
- Model evaluation
- Result persistence
- Enhanced progress bars with:
- Generation-level updates
- Per-individual tracking
- Dataset-aware descriptions
- Improved PCA handling with:
- Component validation
- Cache path standardization
- Verbose cache-loading feedback
- Removed TelegramBot integration from critical execution paths to improve stability
Fixed
- Missing imports and runtime warnings across GA and stacking pipelines
- Async/sync mismatches and unawaited coroutine warnings
- RFE feature parsing errors caused by CSV string representations
- NumPy-to-Python type serialization issues in CSV outputs
- PCA cache path mismatches between PCA and stacking modules
- Model evaluation failures caused by non-numeric features and undefined variables
Removed / Cleanup
- Deprecated TSNE metrics from dataset descriptors
- Unused libraries, dead code paths, and obsolete visualization logic
- Legacy Makefile targets and redundant dependencies
- Redundant imports and inconsistent code ordering across modules
Breaking Changes
combination.pyfully replaced bystacking.py- RFE no longer supports multiple runs (now strictly deterministic)
- GA, PCA, and RFE outputs must follow new mandatory CSV schemas
- PCA cache directory structure has changed
- TelegramBot notifications are no longer part of the default execution flow
Intended Impact
Release v9.0 transforms DDoS-Detector from a collection of loosely coupled feature-selection scripts into a cohesive, research-grade experimental framework, enabling:
- Academic research with strong reproducibility guarantees
- Large-scale classifier and ensemble benchmarking
- Stacking-based DDoS detection studies
- Hyperparameter optimization grounded in explainable feature selection
This version serves as the architectural backbone for subsequent releases, experimental chapters, and advanced case studies.
v9.0 establishes DDoS-Detector as a robust, extensible platform for cutting-edge DDoS detection research, empowering users to conduct systematic evaluations with confidence in their results.
Full Changelog: v8.0-stacking.py...v9.0-hyperparameters_optimization.py
DDoS-Detector v8.0 — Stacking Ensemble Evaluation and Feature Analysis Refactor
Version v8.0 refines and extends the DDoS-Detector project with centralized feature-analysis pipelines and a stacking ensemble classifier evaluation script. This release focuses on unifying the outputs of Genetic Algorithm (GA), Recursive Feature Elimination (RFE), and Principal Component Analysis (PCA) for streamlined dataset preprocessing, feature selection, and classifier evaluation. It also improves reproducibility and memory efficiency across the project.
Key highlights of v8.0 include:
-
Stacking Ensemble Evaluation (
stacking.py):- Orchestrates evaluation of individual classifiers and a stacking meta-classifier
- Automatic loading and sanitization of datasets (NaN/infinite removal)
- Integration of GA, RFE, and PCA outputs for alternative feature sets
- Scaling, optional PCA projection, and selective feature subsetting
- Computes standard metrics (accuracy, precision, recall, F1) plus FPR/FNR and elapsed-time reporting
- Exports consolidated
Stacking_Classifier_Results.csvwith feature lists and hardware metadata - Optional Telegram notifications for evaluation progress
-
Feature Analysis Enhancements:
- Added parallel PCA processing with
ProcessPoolExecutor - Improved
preprocess_dataframefunctions across GA, RFE, and dataset descriptor pipelines - Multiple dataset and multiple run support for GA and RFE experiments
- Centralized t-SNE computation with class-aware sampling and robust memory handling
- Structured CSV/JSON outputs for GA, RFE, and PCA results
- Minor performance and memory optimizations throughout feature analysis scripts
- Added parallel PCA processing with
-
Refactors and Fixes:
- Converted
telegram_bot.pyto an importable module withTelegramBotclass - Updated output messages and progress reporting in GA, RFE, PCA, and dataset descriptor scripts
- Fixed sklearn TSNE iteration argument, non-numeric column handling, and pandas warnings
- Renamed
combination.pytostacking.pywith full orchestration logic - Minor bug fixes and improved reproducibility across the codebase
- Converted
Changelog
Added
stacking.pyfor stacking ensemble evaluation and CSV/JSON export- Multiple datasets and multiple run support in GA and RFE
- Parallel PCA processing and robust t-SNE embedding computation
# columntracking for file order in dataset descriptor outputs- Structured result storage (CSV/JSON) for GA, RFE, PCA
- Optional Telegram notifications integrated with stacking evaluations
Improved
- Refactored
telegram_bot.pyintoTelegramBotclass - Preprocessing and memory management across GA, RFE, PCA, and dataset descriptor
- Output messages, progress bars, and logging consistency
- Reusable helper functions for dataset and t-SNE operations
Fixed
- Multiprocessing ValueError in PCA
- TSNE iteration keyword and min-per-class allocation
- Robust FPR/FNR computation in RFE
- Suppressed pandas DtypeWarning during CSV reads
- Minor bug fixes in GA, RFE, PCA, and dataset descriptor scripts
v8.0 marks a significant step towards a more integrated and efficient DDoS-Detector framework, enabling researchers to seamlessly evaluate and compare multiple feature selection techniques and classifiers within a unified pipeline.
Full Changelog: v7.0-telegram_bot.py...v8.0-stacking.py
DDoS-Detector v7.0 — WGAN-GP Completion and Telegram Bot Notifications
Version v7.0 focuses on expanding the automation, monitoring, and communication capabilities of DDoS-Detector. Building upon the previous PCA and WGAN-GP pipelines, this release introduces a fully featured Telegram notification system and finalizes the WGAN-GP module with all core classes and training utilities.
Key highlights of v7.0 include:
-
WGAN-GP Enhancements:
- Completion of generator (
Generator) and discriminator (Discriminator) classes - Residual block support (
ResidualBlockFC) for the generator - Full training and evaluation loop (
trainfunction) - Data handling via
CSVFlowDatasetclass - Gradient penalty computation for stable GAN training
- Argument parsing and seeding utilities
- Synthetic data generation (
generatefunction)
- Completion of generator (
-
Telegram Bot Notifications (
telegram_bot.py):- Sends messages to a specified Telegram chat using bot API
- Splits long messages to comply with Telegram’s 4096-character limit
- Error handling for failed sends
- Optional integration with sound notifications
- Configuration loaded from
.envfile withTELEGRAM_API_KEYandCHAT_ID
-
Genetic Algorithm Improvements:
- Early stopping, caching, and parallelized fitness evaluation
- Improved progress bar descriptions and execution timing
- Safe handling of edge cases and dataset splits
- Vectorized metric aggregation for performance
-
Project Infrastructure:
- Download scripts for datasets
.envfile support for configuration- Updated requirements and Python version enforcement (>= 3.12)
- Makefile and shell scripts corrected for environment consistency
With v7.0, DDoS-Detector provides a full-featured research and experimentation environment, combining feature selection, dimensionality reduction, synthetic data generation, and real-time notifications for enhanced reproducibility and usability.
Changelog
Added
- Full WGAN-GP implementation:
CSVFlowDataset,ResidualBlockFC,Generator,Discriminatorclassestrain,generate,gradient_penalty,set_seed,parse_argsfunctions
- Telegram bot notifications (
telegram_bot.py) with message splitting and error handling - Early stopping, caching, and parallelization in genetic algorithm
- Execution timing and improved progress bar in GA loops
- Download datasets shell script and
.envsupport
Improved
- Refactored PCA, main, and genetic algorithm modules for readability, type safety, and vectorization
- Updated progress bars and logging
- Improved CSV and file path handling across modules
Fixed
- Multiprocessing pickle issues in GA evaluation
- Nested multiprocessing warnings with scikit-learn estimators
- Corrected caching, output directories, and dataset paths
- Makefile fixed for Python environment consistency
Removed
- Deprecated GA feature selection functions and redundant comments
Notes / Future Work
- Telegram bot enhancements: image/file sending, retry mechanisms, multiple chat support
- WGAN-GP: learning rate scheduling, multi-GPU support, extended feature importance analysis
- GA: continue improving vectorization and caching for large datasets
v7.0 positions DDoS-Detector as a robust, automated, and user-friendly platform for DDoS detection experimentation, synthetic data generation, and real-time monitoring.
Full Changelog: v6.0-wgangp.py...v7.0-telegram_bot.py
DDoS-Detector v6.0 — PCA Analysis Finalization and Synthetic Data Generation Inspired by DRCGAN
Version v6.0 significantly expands DDoS-Detector beyond feature selection and reduction by introducing deep generative modeling for tabular network traffic data, explicitly inspired by the DRCGAN-based data augmentation strategy proposed by Yue et al. (2025). This release finalizes the PCA-based evaluation pipeline started in the previous version and adds a new, standalone WGAN-GP module for generating high-quality synthetic DDoS flow samples.
The pca.py module reaches functional completeness, delivering a clean, modular pipeline for dimensionality reduction experiments with reproducible metrics, structured outputs, and clear separation of concerns (data loading, scaling, PCA application, evaluation, and persistence).
More importantly, v6.0 marks the first deep-learning–based data augmentation capability in the project. The new wgangp.py module implements a conditional Wasserstein GAN with Gradient Penalty, adopting a residual conditional design inspired by the DRCGAN architecture introduced in:
Yue, Meng; Yan, Huayang; Han, Ruize; Wu, Zhijun.
DAD: Enhancing Multi-Class DDoS Attack Classification using Data Augmentation with DRCGAN.
Proceedings of the 2025 4th International Conference on Big Data, Information and Computer Network (BDICN ’25).
While the original work leverages DRCGAN for class-conditional traffic synthesis to mitigate dataset imbalance, DDoS-Detector adapts these principles to a WGAN-GP formulation, prioritizing training stability, label-conditioned generation, and tabular flow realism (e.g., CICDDoS2019).
With this release, DDoS-Detector evolves from a feature-analysis framework into a full experimentation platform, covering:
- Feature extraction and reduction (RFE, GA, PCA)
- Classical machine learning evaluation
- Synthetic data generation grounded in state-of-the-art GAN-based data augmentation research
Changelog
Added
PCA Analysis Pipeline (pca.py)
- Complete PCA workflow implementation:
- Safe data loading and cleaning
- Standardization and train/test split
- PCA application with configurable components
- Model evaluation on reduced feature space
- Modular functions:
run_pca_analysisload_and_clean_datascale_and_splitapply_pca_and_evaluateprint_pca_resultssave_pca_results
- Structured and reproducible PCA result export
- Console-friendly result summaries
Synthetic Data Generation via WGAN-GP (wgangp.py)
- Initial implementation of Conditional WGAN-GP for tabular DDoS datasets
- Data augmentation approach inspired by DRCGAN as proposed in Yue et al. (2025)
- Key capabilities:
- CSV-based dataset loading with automatic scaling
- Label encoding for multi-class conditional generation
- Residual-style conditional generator (DRCGAN-inspired)
- Wasserstein loss with gradient penalty for training stability
- Checkpoint saving for generator and discriminator
- Synthetic sample generation exported to CSV
- CLI-based workflow:
- Training mode (
--mode train) - Generation mode (
--mode gen)
- Training mode (
- CUDA support with optional CPU fallback
Improved
- Updated
requirements.txtto include deep learning and PCA-related dependencies - Library import cleanup and standardization across PCA modules
Notes / Future Work
- WGAN-GP planned extensions:
- Learning rate scheduling
- Data quality and statistical validation metrics for generated samples
- Feature-distribution similarity analysis
- Multi-GPU support
- PCA pipeline ready for parallel execution and extended model comparisons
v6.0 represents a strategic expansion of DDoS-Detector, introducing synthetic data generation inspired by DRCGAN-based data augmentation alongside mature feature-reduction pipelines, positioning the project for advanced research in DDoS detection, data augmentation, and adversarial learning.
Full Changelog: v5.0-pca.py...v6.0-wgangp.py
DDoS-Detector v5.0 — PCA-Based Feature Extraction and Maturity of the Feature Analysis Stack
Version v5.0 completes the feature-analysis trilogy of DDoS-Detector by introducing a Principal Component Analysis (PCA)–based dimensionality reduction and evaluation pipeline. With this release, the project now supports three complementary feature strategies—RFE, Genetic Algorithms, and PCA—covering deterministic selection, evolutionary optimization, and projection-based reduction.
The new pca.py module provides an end-to-end, reproducible workflow for PCA experimentation, including dataset validation, standardization, configurable component grids, stratified cross-validation, and consolidated metric reporting. This positions DDoS-Detector as a comprehensive experimental framework for studying how different feature-reduction techniques impact DDoS detection performance.
Beyond PCA, v5.0 stabilizes and matures the existing pipelines:
- The Genetic Algorithm workflow is fully consolidated, with improved outputs, metrics reporting, population sweeps, and result structures.
- The RFE module is heavily refactored into a clean, modular design with clearer responsibilities and improved result exports.
- The Dataset Descriptor gains quantitative separability analysis via t-SNE scores and persistent visual outputs.
- The Makefile and README are refactored for cross-platform usability and clearer setup instructions.
This release marks the transition from rapid feature addition to a stable, research-grade experimentation platform.
Changelog
Added
PCA Feature Extraction & Evaluation (pca.py)
- Full PCA-based dimensionality reduction pipeline
- Z-score standardization of numeric features prior to PCA
- Configurable grid of
n_components - 10-fold Stratified Cross-Validation on training data
- Final evaluation on held-out test split
- Aggregated metrics:
- Accuracy, Precision, Recall, F1-score
- False Positive Rate (FPR), False Negative Rate (FNR)
- Export of consolidated results to:
Feature_Analysis/PCA_Results.csv
- Console summary with best-configuration selection by CV F1-score
- Designed for extensibility (parallel execution, saved PCA objects)
Dataset Descriptor Enhancements
- Added quantitative t-SNE separability score
- Automatic saving of dataset t-SNE plots
- Improved file discovery logic for batch dataset processing
Improved / Refactored
Genetic Algorithm Pipeline
- Completed functional decomposition of the GA workflow:
- Dataset loading, splitting, evaluation, and result analysis
- Improved console output and metric reporting
- Consolidated and restructured GA result files
- Added population sweep execution and clearer output summaries
- Fixed dataset path handling and dependency omissions (DEAP)
- Refined minimum population constraints and output directories
RFE Pipeline
- Major internal refactor:
- Extracted preprocessing, scaling, selection, and metric computation into dedicated functions
- Improved result file structure and console output clarity
- Added utilities for:
- Top-feature extraction
- Printing and saving ranked features
- Removed unnecessary feature-name normalization
- Improved robustness in dataset loading and CSV handling
Build, Tooling, and Documentation
- Makefile refactored for cross-platform support (Windows, Linux, macOS)
- Fixed OS-dependent Python command resolution
- README extensively restructured:
- Clear virtual environment and dependency installation steps
- Simplified setup and citation sections
- Updated datasets, results, and project description sections
- Updated
.gitignorerules and minor cleanup across modules
Fixed
- Incorrect dataset paths in GA execution
- Missing dependencies in
requirements.txt - Minor output inconsistencies across GA and RFE pipelines
v5.0 establishes DDoS-Detector as a complete feature-analysis and evaluation framework, enabling systematic comparison between RFE, Genetic Algorithms, and PCA under a unified, reproducible experimental design.
Full Changelog: v4.0-genetic_algorithm.py...v5.0-pca.py
DDoS-Detector v4.0 — Genetic Algorithm Feature Selection and Analysis Expansion
Version v4.0 significantly expands DDoS-Detector’s feature-analysis capabilities by introducing a Genetic Algorithm–based feature selection pipeline. This release completes the evolution started in v3.0, moving beyond deterministic feature ranking (RFE) into stochastic, population-based optimization using DEAP.
With the addition of genetic_algorithm.py, the project now supports multiple, complementary feature-selection strategies under a unified experimental framework. The GA pipeline integrates dataset preprocessing, fitness evaluation with multi-metric outputs, result consolidation, visualization, and optional external notifications, reinforcing the project’s focus on reproducible and extensible research workflows.
In parallel, this release consolidates documentation quality across all major modules (main.py, dataset_descriptor.py, rfe.py) and extends the Makefile to expose the new GA workflow as a first-class execution target.
Changelog
Added
Genetic Algorithm Feature Selection (genetic_algorithm.py)
- DEAP-based binary-mask Genetic Algorithm for feature selection
- End-to-end pipeline:
- Safe CSV loading and numeric feature filtering
- Feature scaling and GA population initialization
- Fitness evaluation using RandomForest (default)
- Multi-metric fitness evaluation:
- Accuracy, precision, recall, F1-score
- False Positive Rate (FPR) and False Negative Rate (FNR)
- Consolidated results export to:
Feature_Analysis/Genetic_Algorithm_Results.csv
- Automatic generation of feature statistics and boxplots
- Optional runtime monitoring and Telegram progress notifications
- Designed for extensibility (population sweeps, multiple runs)
RFE Module Completion (rfe.py)
- Finalized functional implementation:
- Safe path handling and filename sanitization
- Top-feature analysis utilities
- Unified execution via
run_rfe
- Optional sound feedback on completion
- Fully documented public API
Improved / Refactored
- Standardized and expanded function-level documentation in:
main.pydataset_descriptor.pyrfe.py
- Improved consistency in module structure and comments
- Makefile extended with:
genetic_algorithmexecution rule
- Documentation cleanup and alignment with current project scope
Project Evolution
- Feature selection elevated to a core research dimension:
- Deterministic approach: Recursive Feature Elimination (RFE)
- Stochastic approach: Genetic Algorithm (GA)
- Clear separation of concerns across modules:
main.py: model training, evaluation, explainabilitydataset_descriptor.py: dataset inspection and compatibility analysisrfe.py: deterministic feature rankinggenetic_algorithm.py: evolutionary feature optimization
- Foundation laid for comparative studies between feature-selection strategies
v4.0 marks a major milestone for DDoS-Detector, transforming it into a feature-selection experimentation platform capable of supporting advanced research in DDoS detection, model optimization, and dataset analysis.
Full Changelog: v3.0-rfe.py...v4.0-genetic_algorithm.py
DDoS-Detector v3.0 — Feature Selection Automation with RFE and Framework Consolidation
Version v3.0 introduces automated feature-selection capabilities to DDoS-Detector, completing the transition from a dataset/model evaluation framework into a broader experimental analysis platform.
This release finalizes and stabilizes the dataset_descriptor.py module introduced in v2.0 and adds a new, self-contained Recursive Feature Elimination (RFE) automation tool (rfe.py). The project now supports dataset characterization, model evaluation, explainability, and feature-selection analysis under a unified structure, with Makefile-driven execution paths for reproducible experimentation.
The addition of RFE establishes the first feature-analysis pipeline in the project, enabling systematic investigation of feature relevance and model performance trade-offs across DDoS datasets.
Changelog
Added
Recursive Feature Elimination Module (rfe.py)
- Automated RFE workflow using
RandomForestClassifieras the base estimator - Safe CSV loading with column sanitization and validation
- Numeric feature standardization via Z-score scaling
- Configurable number of selected features
- Comprehensive evaluation metrics:
- Accuracy, precision, recall, F1-score
- False Positive Rate (FPR) and False Negative Rate (FNR)
- Structured export of results to:
Feature_Analysis/RFE_Run_Results.csv
- Embedded hardware metadata for reproducibility
- Portable execution with OS-aware behavior (e.g., optional sound feedback)
- Makefile integration for direct execution
Dataset Descriptor Enhancements (dataset_descriptor.py)
- Completed implementation of all planned functions:
- File discovery and dataset loading
- Label detection and feature summarization
- Missing value and class distribution analysis
- Dataset report generation and CSV export
- Centralized execution flow via
generate_dataset_report - Optional execution feedback (sound notification)
- Verbose output control for debugging and diagnostics
- Comprehensive module-level documentation header
Improved / Refactored
- Refined and standardized headers in
main.pyanddataset_descriptor.py - Updated TODO sections to reflect new project scope
- Makefile extended with RFE execution rule
- Minor refactors to improve readability and documentation consistency
Project Evolution
- Introduction of feature-selection experimentation as a first-class concern
- Clear separation of responsibilities:
main.py: model training, evaluation, explainabilitydataset_descriptor.py: dataset inspection and compatibility analysisrfe.py: feature ranking and selection analysis
- Establishes groundwork for future comparative feature-selection methods (e.g., PCA, GA)
v3.0 marks the expansion of DDoS-Detector into a multi-dimensional research framework, combining dataset analysis, model evaluation, explainability, and feature-selection into a cohesive and reproducible system.
Full Changelog: v2.0-dataset_descriptor.py...v3.0-rfe.py
DDoS-Detector v2.0 — Modular Evaluation Pipeline + Dataset Descriptor & Cross-Dataset Analysis
Description
Version v2.0 is a major functional expansion of DDoS-Detector, transforming the project from a single evaluation script into a structured, end-to-end experimentation framework with dataset introspection and cross-dataset analysis.
This release completes the full modular pipeline introduced in v1.0 by implementing all core functions in main.py, covering dataset loading, preprocessing, model training, evaluation, reporting, and explainability. In addition, it introduces dataset_descriptor.py, a new module dedicated to dataset discovery, metadata extraction, visualization, and cross-dataset compatibility validation.
Together, these changes establish DDoS-Detector as a reproducible research framework capable of both model-centric evaluation and dataset-centric analysis, supporting large-scale experiments across heterogeneous DDoS datasets.
Changelog
Added
Core Evaluation Pipeline (main.py)
- Complete modular implementation of the evaluation workflow:
- Safe dataset loading (CSV / ARFF) with label auto-detection
- Feature preprocessing and sanitization
- Train/test splitting and k-fold cross-validation
- Model factory and unified training interface
- Extended metrics generation and aggregation
- Per-model and per-dataset CSV report exports
- Explainability support:
- TreeSHAP explanations for tree-based models
- Generic SHAP explanations
- LIME explanations
- Unified multi-method explanation dispatcher
- Utility helpers:
- Verbose logging control
- Filepath validation
- Duration formatting
- Optional execution feedback (sound notification)
- Makefile integration for automated execution targets
Dataset Descriptor Module (dataset_descriptor.py)
- Recursive dataset discovery for CSV-based datasets
- Automatic extraction of dataset metadata:
- Sample counts, feature counts, numeric feature detection
- Missing value statistics
- Label column identification and class distributions
- Optional 2D t-SNE visualization:
- Class-aware downsampling with minority-class preservation
- Compatibility handling for sklearn version differences
- Cross-dataset compatibility analysis:
- Feature union/intersection comparison across dataset groups
- Normalized reporting with consistent Dataset A/B semantics
- Disk-space validation before large output generation
- Structured per-dataset results directory (
Dataset_Description/)
Improved / Refactored
- Refactored
main.pyentry point to orchestrate the full pipeline - Modularized logic into clearly defined, reusable functions
- Improved Makefile with dataset descriptor execution rules
- Increased robustness against malformed data and version mismatches
Project Evolution
- Transition from a basic framework skeleton (v1.0) to a fully operational experimental platform
- Clear separation between:
- Model evaluation concerns (
main.py) - Dataset analysis and validation concerns (
dataset_descriptor.py)
- Model evaluation concerns (
- Established groundwork for scalable experimentation and future CLI-driven execution
v2.0 marks the point where DDoS-Detector becomes a complete research tool, not just an experiment script—supporting reproducible evaluation, explainability, and dataset-level reasoning in a unified framework.
Full Changelog: v1.0-main.py...v2.0-dataset_descriptor.py