Skip to content

kapoorabhishek24/global-terrorism-analysis

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

77 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Global Terrorism Analysis: A Comprehensive Data Mining Study

Technical University of Munich (TUM) Data Mining Lab Project Advanced Analytics on Global Terrorism Database (GTD) 1970-2016

🎯 Project Overview

This repository contains a comprehensive data mining and machine learning analysis of global terrorism incidents spanning from 1970 to 2016. The project leverages the Global Terrorism Database (GTD) containing over 156,000 terrorist incidents worldwide, applying advanced statistical methods, clustering algorithms, and predictive modeling to understand patterns, trends, and characteristics of terrorist activities.

πŸ“Š Dataset Information

Global Terrorism Database (GTD)

  • Source: National Consortium for the Study of Terrorism and Responses to Terrorism (START)
  • Time Period: 1970-2016
  • Total Incidents: 156,772
  • Attributes: 137 variables
  • Geographic Coverage: Global (all continents)
  • Update Frequency: Annual

Key Variables Analyzed

  • Temporal: Year, month, day, extended incidents
  • Geographic: Country, region, city, coordinates
  • Attack Details: Attack type, weapon type, target type
  • Casualties: Number killed, wounded, hostages
  • Perpetrators: Group names, number of perpetrators
  • Textual: Summary, motive, additional notes

πŸ—οΈ Repository Structure

global-terrorism-analysis/
β”œβ”€β”€ README.md                           # This documentation
β”œβ”€β”€ requirements.txt                    # Python dependencies
β”œβ”€β”€ environment.yml                     # Conda environment
β”œβ”€β”€ .gitignore                          # Git ignore rules
β”‚
β”œβ”€β”€ data/                               # Data directory
β”‚   β”œβ”€β”€ raw/                           # Original datasets
β”‚   β”‚   β”œβ”€β”€ globalterrorismdb_0616dist.xlsx
β”‚   β”‚   β”œβ”€β”€ Codebook.pdf
β”‚   β”‚   └── .gitkeep
β”‚   β”œβ”€β”€ processed/                     # Cleaned datasets
β”‚   β”‚   β”œβ”€β”€ terrorism.csv
β”‚   β”‚   β”œβ”€β”€ final_group_names.csv
β”‚   β”‚   β”œβ”€β”€ terrorism_50_train_test.csv
β”‚   β”‚   β”œβ”€β”€ terrorism_50_val.csv
β”‚   β”‚   β”œβ”€β”€ terrorism_red_cat_for_random_forest.csv
β”‚   β”‚   β”œβ”€β”€ terrorism_red_cat_with_country.csv
β”‚   β”‚   └── .gitkeep
β”‚   └── external/                      # External data sources
β”‚       β”œβ”€β”€ Fossil Fuels.csv
β”‚       β”œβ”€β”€ Fuel Imports.csv
β”‚       β”œβ”€β”€ National Income.csv
β”‚       β”œβ”€β”€ Population.csv
β”‚       └── .gitkeep
β”‚
β”œβ”€β”€ notebooks/                         # Jupyter notebooks
β”‚   β”œβ”€β”€ 01_data_exploration/           # EDA and data understanding
β”‚   β”‚   β”œβ”€β”€ missing_data_analysis.ipynb
β”‚   β”‚   β”œβ”€β”€ frequency_analysis.ipynb
β”‚   β”‚   β”œβ”€β”€ heatmaps_analysis.ipynb
β”‚   β”‚   └── text_mining_analysis.ipynb
β”‚   β”œβ”€β”€ 02_visualization/             # Advanced visualizations
β”‚   β”‚   β”œβ”€β”€ joint_plots_analysis.ipynb
β”‚   β”‚   β”œβ”€β”€ missing_data_visualization.ipynb
β”‚   β”‚   β”œβ”€β”€ group_activity_analysis/
β”‚   β”‚   β”œβ”€β”€ group_name_research/
β”‚   β”‚   └── hostage_analysis.ipynb
β”‚   β”œβ”€β”€ 03_clustering/                 # Clustering analysis
β”‚   β”‚   β”œβ”€β”€ Clustering/
β”‚   β”‚   └── terrorist_chapter_analysis/
β”‚   └── 04_prediction/                 # Predictive modeling
β”‚       β”œβ”€β”€ Prediction (nkill + nwound).ipynb
β”‚       β”œβ”€β”€ Prediction (nkill).ipynb
β”‚       β”œβ”€β”€ Prediction (nwound).ipynb
β”‚       β”œβ”€β”€ Prediction group.ipynb
β”‚       β”œβ”€β”€ Data Cleaning for Prediction [nkill, nwound].ipynb
β”‚       └── Create 50-50 Split.ipynb
β”‚
β”œβ”€β”€ src/                               # Source code
β”‚   β”œβ”€β”€ data/                          # Data processing modules
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── preprocessing.py
β”‚   β”œβ”€β”€ analysis/                      # Analysis modules
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── clustering.py
β”‚   └── utils/                         # Utility functions
β”‚       └── __init__.py
β”‚
β”œβ”€β”€ scripts/                           # R and Python scripts
β”‚   β”œβ”€β”€ data_processing.R
β”‚   β”œβ”€β”€ create_dataset_predict_gname.R
β”‚   └── terrorism_red_cat_for_nkill_pred.csv
β”‚
β”œβ”€β”€ reports/                           # Analysis reports
β”‚   β”œβ”€β”€ figures/                       # Generated figures
β”‚   β”‚   β”œβ”€β”€ wordclouds/
β”‚   β”‚   β”œβ”€β”€ .gitkeep
β”‚   β”‚   └── [various PNG files]
β”‚   └── videos/                        # Animated visualizations
β”‚       β”œβ”€β”€ week2_visualizations/
β”‚       β”œβ”€β”€ week3_analysis/
β”‚       β”œβ”€β”€ week4_clustering/
β”‚       └── .gitkeep
β”‚
β”œβ”€β”€ models/                            # Trained models
β”‚   β”œβ”€β”€ prediction_results/
β”‚   └── .gitkeep
β”‚
β”œβ”€β”€ tests/                            # Test files
β”‚   └── test_data_processing.py
β”‚
└── docs/                             # Documentation
    └── methodology.md

πŸ”¬ Research Methodology

Phase 1: Data Exploration & Preprocessing

  • Missing Data Analysis: Comprehensive assessment of data completeness
  • Frequency Analysis: Temporal, geographic, and categorical distributions
  • Text Mining: Word frequency analysis and cloud generation
  • Data Quality: Validation and cleaning procedures

Phase 2: Advanced Analytics & Visualization

  • Joint Plot Analysis: Multi-dimensional data relationships
  • Group Activity Analysis: Terrorist organization behavior patterns
  • Animated Visualizations: Time-series animations of terrorist activities
  • Geographic Analysis: Spatial distribution and patterns

Phase 3: Clustering Analysis

  • Terrorist Group Clustering: Unsupervised learning to identify group patterns
  • Attack Type Clustering: Categorization of attack methodologies
  • Target Type Clustering: Classification of attack targets
  • Weapon Type Clustering: Weapon usage pattern analysis

Phase 4: Predictive Modeling

  • Casualty Prediction: Models to predict number of killed and wounded
  • Group Prediction: Classification of responsible terrorist groups
  • Feature Engineering: Creation of predictive variables
  • Model Validation: Cross-validation and performance assessment

πŸ“ˆ Key Findings

Temporal Patterns

  • Peak Activity: 2014-2015 showed highest terrorist activity
  • Seasonal Trends: Certain months and days show higher incident rates
  • Long-term Trends: Evolution of terrorist tactics over decades

Geographic Distribution

  • Regional Hotspots: Middle East, South Asia, and Africa
  • Country Analysis: Iraq, Afghanistan, and Pakistan as major targets
  • Urban vs Rural: Concentration of attacks in urban areas

Attack Characteristics

  • Weapon Preferences: Explosives and firearms most common
  • Target Types: Private citizens, military, and government facilities
  • Attack Methods: Bombings and armed assaults dominate

Group Analysis

  • Major Organizations: ISIS, Taliban, Boko Haram among top groups
  • Group Specialization: Different groups favor different attack types
  • Geographic Focus: Groups tend to operate in specific regions

πŸ› οΈ Technical Implementation

Data Processing Pipeline

# Data loading and preprocessing
data = load_gtd_data('data/raw/globalterrorismdb_0616dist.xlsx')
cleaned_data = preprocess_data(data)
features = engineer_features(cleaned_data)

Machine Learning Models

  • Clustering: K-means, hierarchical clustering, DBSCAN
  • Classification: Random Forest, SVM, Neural Networks, XGBoost
  • Regression: Linear regression, ensemble methods, deep learning
  • Text Mining: TF-IDF, word embeddings, sentiment analysis

Visualization Framework

  • Static Plots: Matplotlib, Seaborn, ggplot2
  • Interactive Visualizations: Plotly, Bokeh, D3.js
  • Animated Charts: Time-series animations, geographic flows
  • Dashboard: Streamlit, Dash, or Jupyter widgets

πŸ“Š Analysis Results

Clustering Results

  • Group Clusters: Identified distinct terrorist group categories
  • Attack Patterns: Clustered similar attack methodologies
  • Target Clusters: Grouped similar target types
  • Weapon Clusters: Categorized weapon usage patterns

Predictive Model Performance

  • Casualty Prediction: Achieved significant accuracy in predicting casualties
  • Group Classification: Successfully classified responsible groups
  • Feature Importance: Identified key predictive variables
  • Model Validation: Cross-validated results for reliability

Text Mining Insights

  • Common Themes: Identified recurring motives and justifications
  • Group Communications: Analyzed group statements and propaganda
  • Target Descriptions: Understanding of target selection criteria
  • Weapon Descriptions: Analysis of weapon acquisition and usage

πŸŽ₯ Visualizations and Media

Animated Visualizations

  • Attack Types Over Time: Evolution of attack methodologies
  • Geographic Spread: Global distribution of terrorist activities
  • Casualty Trends: Changes in attack lethality over time
  • Group Activity: Rise and fall of terrorist organizations

Interactive Dashboards

  • Regional Analysis: Interactive exploration by geographic region
  • Temporal Analysis: Time-based filtering and analysis
  • Group Comparison: Side-by-side group behavior analysis
  • Attack Type Distribution: Interactive categorization tools

πŸ‘₯ Contributors

This project was developed collaboratively by multiple contributors from the TUM Data Mining Lab:

Individual Contributions

  • Data Processing: Collaborative effort on data cleaning and preprocessing
  • Analysis Development: Joint work on statistical analysis and machine learning
  • Visualization Creation: Team effort on charts, graphs, and animations
  • Model Development: Collaborative approach to predictive modeling
  • Documentation: Shared responsibility for code documentation and analysis

πŸš€ Getting Started

Prerequisites

  • Python: Version 3.8+ with data science libraries
  • R: Version 4.0+ with required packages
  • Jupyter Notebook: For interactive analysis
  • Git: Version control

Installation

# Clone the repository
git clone <repository-url>
cd global-terrorism-analysis

# Create conda environment
conda env create -f environment.yml
conda activate terrorism-analysis

# Or use pip
pip install -r requirements.txt

# Install R packages
R -e "install.packages(c('dplyr', 'tm', 'wordcloud', 'openxlsx', 'ggplot2', 'plotly'))"

Running the Analysis

# Activate environment
conda activate terrorism-analysis

# Start Jupyter notebook
jupyter notebook

# Run data processing
python scripts/data_processing.py

# Run specific analysis
jupyter notebook notebooks/01_data_exploration/missing_data_analysis.ipynb

πŸ“š Methodology Details

Data Preprocessing

  1. Data Loading: Excel file parsing and initial validation
  2. Missing Data Handling: Systematic approach to missing values
  3. Feature Engineering: Creation of derived variables
  4. Data Validation: Quality checks and consistency verification

Statistical Analysis

  1. Descriptive Statistics: Comprehensive data summaries
  2. Correlation Analysis: Relationship identification between variables
  3. Distribution Analysis: Understanding data distributions
  4. Outlier Detection: Identification and handling of anomalies

Machine Learning Pipeline

  1. Data Splitting: Training, validation, and test sets
  2. Feature Selection: Identification of predictive variables
  3. Model Training: Multiple algorithm implementation
  4. Model Evaluation: Performance assessment and comparison
  5. Hyperparameter Tuning: Optimization of model parameters

πŸ” Research Applications

Academic Research

  • Terrorism Studies: Understanding terrorist behavior patterns
  • Security Analysis: Identifying threat patterns and trends
  • Policy Research: Informing counter-terrorism strategies
  • Criminology: Understanding criminal organization behavior

Practical Applications

  • Security Planning: Risk assessment and mitigation strategies
  • Intelligence Analysis: Pattern recognition and threat identification
  • Resource Allocation: Optimizing security resource deployment
  • Early Warning Systems: Predictive threat assessment

πŸ“„ Data Sources and References

Primary Data Source

  • Global Terrorism Database (GTD): National Consortium for the Study of Terrorism and Responses to Terrorism (START)
  • Codebook: Comprehensive documentation of all variables and coding schemes
  • Data Quality: Regular updates and validation by START researchers

Additional Data Sources

  • Country Data: Population and economic indicators
  • Fuel Data: Energy and resource information
  • Income Data: Economic development metrics

πŸ“„ License

This project is for academic research purposes. Please cite appropriately when using any findings or methodologies from this work.


This repository represents a comprehensive analysis of global terrorism data, combining statistical methods, machine learning techniques, and advanced visualization to understand patterns and trends in terrorist activities worldwide.

About

Repository for the Data Mining Lab.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.8%
  • Other 0.2%