Technical University of Munich (TUM) Data Mining Lab Project Advanced Analytics on Global Terrorism Database (GTD) 1970-2016
This repository contains a comprehensive data mining and machine learning analysis of global terrorism incidents spanning from 1970 to 2016. The project leverages the Global Terrorism Database (GTD) containing over 156,000 terrorist incidents worldwide, applying advanced statistical methods, clustering algorithms, and predictive modeling to understand patterns, trends, and characteristics of terrorist activities.
- Source: National Consortium for the Study of Terrorism and Responses to Terrorism (START)
- Time Period: 1970-2016
- Total Incidents: 156,772
- Attributes: 137 variables
- Geographic Coverage: Global (all continents)
- Update Frequency: Annual
- Temporal: Year, month, day, extended incidents
- Geographic: Country, region, city, coordinates
- Attack Details: Attack type, weapon type, target type
- Casualties: Number killed, wounded, hostages
- Perpetrators: Group names, number of perpetrators
- Textual: Summary, motive, additional notes
global-terrorism-analysis/
βββ README.md # This documentation
βββ requirements.txt # Python dependencies
βββ environment.yml # Conda environment
βββ .gitignore # Git ignore rules
β
βββ data/ # Data directory
β βββ raw/ # Original datasets
β β βββ globalterrorismdb_0616dist.xlsx
β β βββ Codebook.pdf
β β βββ .gitkeep
β βββ processed/ # Cleaned datasets
β β βββ terrorism.csv
β β βββ final_group_names.csv
β β βββ terrorism_50_train_test.csv
β β βββ terrorism_50_val.csv
β β βββ terrorism_red_cat_for_random_forest.csv
β β βββ terrorism_red_cat_with_country.csv
β β βββ .gitkeep
β βββ external/ # External data sources
β βββ Fossil Fuels.csv
β βββ Fuel Imports.csv
β βββ National Income.csv
β βββ Population.csv
β βββ .gitkeep
β
βββ notebooks/ # Jupyter notebooks
β βββ 01_data_exploration/ # EDA and data understanding
β β βββ missing_data_analysis.ipynb
β β βββ frequency_analysis.ipynb
β β βββ heatmaps_analysis.ipynb
β β βββ text_mining_analysis.ipynb
β βββ 02_visualization/ # Advanced visualizations
β β βββ joint_plots_analysis.ipynb
β β βββ missing_data_visualization.ipynb
β β βββ group_activity_analysis/
β β βββ group_name_research/
β β βββ hostage_analysis.ipynb
β βββ 03_clustering/ # Clustering analysis
β β βββ Clustering/
β β βββ terrorist_chapter_analysis/
β βββ 04_prediction/ # Predictive modeling
β βββ Prediction (nkill + nwound).ipynb
β βββ Prediction (nkill).ipynb
β βββ Prediction (nwound).ipynb
β βββ Prediction group.ipynb
β βββ Data Cleaning for Prediction [nkill, nwound].ipynb
β βββ Create 50-50 Split.ipynb
β
βββ src/ # Source code
β βββ data/ # Data processing modules
β β βββ __init__.py
β β βββ preprocessing.py
β βββ analysis/ # Analysis modules
β β βββ __init__.py
β β βββ clustering.py
β βββ utils/ # Utility functions
β βββ __init__.py
β
βββ scripts/ # R and Python scripts
β βββ data_processing.R
β βββ create_dataset_predict_gname.R
β βββ terrorism_red_cat_for_nkill_pred.csv
β
βββ reports/ # Analysis reports
β βββ figures/ # Generated figures
β β βββ wordclouds/
β β βββ .gitkeep
β β βββ [various PNG files]
β βββ videos/ # Animated visualizations
β βββ week2_visualizations/
β βββ week3_analysis/
β βββ week4_clustering/
β βββ .gitkeep
β
βββ models/ # Trained models
β βββ prediction_results/
β βββ .gitkeep
β
βββ tests/ # Test files
β βββ test_data_processing.py
β
βββ docs/ # Documentation
βββ methodology.md
- Missing Data Analysis: Comprehensive assessment of data completeness
- Frequency Analysis: Temporal, geographic, and categorical distributions
- Text Mining: Word frequency analysis and cloud generation
- Data Quality: Validation and cleaning procedures
- Joint Plot Analysis: Multi-dimensional data relationships
- Group Activity Analysis: Terrorist organization behavior patterns
- Animated Visualizations: Time-series animations of terrorist activities
- Geographic Analysis: Spatial distribution and patterns
- Terrorist Group Clustering: Unsupervised learning to identify group patterns
- Attack Type Clustering: Categorization of attack methodologies
- Target Type Clustering: Classification of attack targets
- Weapon Type Clustering: Weapon usage pattern analysis
- Casualty Prediction: Models to predict number of killed and wounded
- Group Prediction: Classification of responsible terrorist groups
- Feature Engineering: Creation of predictive variables
- Model Validation: Cross-validation and performance assessment
- Peak Activity: 2014-2015 showed highest terrorist activity
- Seasonal Trends: Certain months and days show higher incident rates
- Long-term Trends: Evolution of terrorist tactics over decades
- Regional Hotspots: Middle East, South Asia, and Africa
- Country Analysis: Iraq, Afghanistan, and Pakistan as major targets
- Urban vs Rural: Concentration of attacks in urban areas
- Weapon Preferences: Explosives and firearms most common
- Target Types: Private citizens, military, and government facilities
- Attack Methods: Bombings and armed assaults dominate
- Major Organizations: ISIS, Taliban, Boko Haram among top groups
- Group Specialization: Different groups favor different attack types
- Geographic Focus: Groups tend to operate in specific regions
# Data loading and preprocessing
data = load_gtd_data('data/raw/globalterrorismdb_0616dist.xlsx')
cleaned_data = preprocess_data(data)
features = engineer_features(cleaned_data)- Clustering: K-means, hierarchical clustering, DBSCAN
- Classification: Random Forest, SVM, Neural Networks, XGBoost
- Regression: Linear regression, ensemble methods, deep learning
- Text Mining: TF-IDF, word embeddings, sentiment analysis
- Static Plots: Matplotlib, Seaborn, ggplot2
- Interactive Visualizations: Plotly, Bokeh, D3.js
- Animated Charts: Time-series animations, geographic flows
- Dashboard: Streamlit, Dash, or Jupyter widgets
- Group Clusters: Identified distinct terrorist group categories
- Attack Patterns: Clustered similar attack methodologies
- Target Clusters: Grouped similar target types
- Weapon Clusters: Categorized weapon usage patterns
- Casualty Prediction: Achieved significant accuracy in predicting casualties
- Group Classification: Successfully classified responsible groups
- Feature Importance: Identified key predictive variables
- Model Validation: Cross-validated results for reliability
- Common Themes: Identified recurring motives and justifications
- Group Communications: Analyzed group statements and propaganda
- Target Descriptions: Understanding of target selection criteria
- Weapon Descriptions: Analysis of weapon acquisition and usage
- Attack Types Over Time: Evolution of attack methodologies
- Geographic Spread: Global distribution of terrorist activities
- Casualty Trends: Changes in attack lethality over time
- Group Activity: Rise and fall of terrorist organizations
- Regional Analysis: Interactive exploration by geographic region
- Temporal Analysis: Time-based filtering and analysis
- Group Comparison: Side-by-side group behavior analysis
- Attack Type Distribution: Interactive categorization tools
This project was developed collaboratively by multiple contributors from the TUM Data Mining Lab:
- Data Processing: Collaborative effort on data cleaning and preprocessing
- Analysis Development: Joint work on statistical analysis and machine learning
- Visualization Creation: Team effort on charts, graphs, and animations
- Model Development: Collaborative approach to predictive modeling
- Documentation: Shared responsibility for code documentation and analysis
- Python: Version 3.8+ with data science libraries
- R: Version 4.0+ with required packages
- Jupyter Notebook: For interactive analysis
- Git: Version control
# Clone the repository
git clone <repository-url>
cd global-terrorism-analysis
# Create conda environment
conda env create -f environment.yml
conda activate terrorism-analysis
# Or use pip
pip install -r requirements.txt
# Install R packages
R -e "install.packages(c('dplyr', 'tm', 'wordcloud', 'openxlsx', 'ggplot2', 'plotly'))"# Activate environment
conda activate terrorism-analysis
# Start Jupyter notebook
jupyter notebook
# Run data processing
python scripts/data_processing.py
# Run specific analysis
jupyter notebook notebooks/01_data_exploration/missing_data_analysis.ipynb- Data Loading: Excel file parsing and initial validation
- Missing Data Handling: Systematic approach to missing values
- Feature Engineering: Creation of derived variables
- Data Validation: Quality checks and consistency verification
- Descriptive Statistics: Comprehensive data summaries
- Correlation Analysis: Relationship identification between variables
- Distribution Analysis: Understanding data distributions
- Outlier Detection: Identification and handling of anomalies
- Data Splitting: Training, validation, and test sets
- Feature Selection: Identification of predictive variables
- Model Training: Multiple algorithm implementation
- Model Evaluation: Performance assessment and comparison
- Hyperparameter Tuning: Optimization of model parameters
- Terrorism Studies: Understanding terrorist behavior patterns
- Security Analysis: Identifying threat patterns and trends
- Policy Research: Informing counter-terrorism strategies
- Criminology: Understanding criminal organization behavior
- Security Planning: Risk assessment and mitigation strategies
- Intelligence Analysis: Pattern recognition and threat identification
- Resource Allocation: Optimizing security resource deployment
- Early Warning Systems: Predictive threat assessment
- Global Terrorism Database (GTD): National Consortium for the Study of Terrorism and Responses to Terrorism (START)
- Codebook: Comprehensive documentation of all variables and coding schemes
- Data Quality: Regular updates and validation by START researchers
- Country Data: Population and economic indicators
- Fuel Data: Energy and resource information
- Income Data: Economic development metrics
This project is for academic research purposes. Please cite appropriately when using any findings or methodologies from this work.
This repository represents a comprehensive analysis of global terrorism data, combining statistical methods, machine learning techniques, and advanced visualization to understand patterns and trends in terrorist activities worldwide.