A comprehensive deep learning framework for detecting fake news using hierarchical Graph Neural Networks (GNNs) and the NELA-GT2022 dataset. This project implements a multi-level approach that captures relationships between words, sentences, articles, authors, sources, and topics to improve fake news classification accuracy.
This project implements a hierarchical GNN architecture that processes news articles at multiple levels:
- Word Level: Individual word embeddings and relationships
- Sentence Level: Sentence-level representations and connections
- Article Level: Full article classification with metadata integration
The processing pipeline converts raw news articles into a multi-level graph representation. After sentence segmentation and tokenization, the system generates contextual embeddings and constructs word-, sentence-, and article-level graphs enriched with metadata such as authors, sources, and topics. These hierarchical graph structures are then passed through a GNN encoder, followed by a classification head that predicts whether an article is reliable, unreliable, or mixed. All experiments, metrics, and configurations are automatically tracked using MLflow to ensure full reproducibility.
The model uses a hierarchical Graph Neural Network that mirrors the natural structure of text. Word-level subgraphs capture fine-grained semantic relationships, which are pooled into sentence nodes incorporating linguistic, semantic, and sentiment features. These sentence embeddings are further aggregated into article nodes and combined with metadata nodes in a heterogeneous global graph. A GNN encoder processes this unified structure to learn both local and global patterns, enabling robust classification and improved explainability compared to text-only or flat GNN approaches.
Each level of the hierarchical GNN uses a tailored set of node features designed to capture linguistic structure, semantic meaning, and contextual metadata. Word nodes encode fine-grained token information using BERT embeddings and positional cues, sentence nodes aggregate semantic and structural signals such as sentiment and TF-IDF importance, and article nodes integrate global content with metadata like source, author, and topic. These multi-scale representations enable the model to learn both local dependencies and higher-level relational patterns essential for reliable fake news detection.
| Feature Name | Type | Description |
|---|---|---|
| BERT embedding | Vector | Contextual semantic representation of token |
| Word position | Integer | Normalized index of the word in its sentence |
| Word length | Integer | Normalized token length |
| Feature Name | Type | Description |
|---|---|---|
| Pooled embedding | Vector | Aggregated representation from word subgraph |
| Sentence position | Integer | Sentence index in article |
| Sentence length | Integer | Number of tokens |
| Named entity count | Integer | Number of extracted named entities |
| TF-IDF weight | Float | Importance score based on TF-IDF |
| Sentiment score | Embedding | Sentence sentiment (BERT-based) |
| Headline similarity | Cosine similarity | Similarity between sentence and the headline |
| Feature Name | Type | Description |
|---|---|---|
| Article embedding | Vector | Output from intra-article GNN |
| Publish timestamp | Timestamp | Time of publication |
| Article length | Integer | Number of sentences or tokens |
| Sentiment score | Embedding | Overall article sentiment |
| Topic name embed | Embedding | Topic embedding assigned via BERT |
| Headline embedding | Vector | Embedding of article headline |
| Feature Name | Type | Description |
|---|---|---|
| Source name | String | For interpretability only (not used in GNN) |
| Amount of articles | Integer | Number of articles published in dataset |
| Feature Name | Type | Description |
|---|---|---|
| Journalist name | String | For interpretability only (not used in GNN) |
| Article count | Integer | Number of articles written |
| Feature Name | Type | Description |
|---|---|---|
| Topic name | String | For interpretability only (not used in GNN) |
| Article count | Integer | Number of articles assigned to this topic |
| Controversy score | Float | Embedding divergence indicating disagreement / topical controversy |
This project uses the NELA-GT-2022 dataset, a large collection of news articles containing full text, titles, authors, sources, and publication timestamps. Each article is labeled into one of three classes—Reliable (0), Unreliable (1), or Mixed (2)—and includes rich metadata that enables the construction of heterogeneous graphs linking articles to sources, authors, topics, and entities. This structure makes the dataset well-suited for hierarchical and relational modeling approaches like the proposed H-GNN.
The table below compares the proposed hierarchical GNN (H-GNN) against a range of traditional machine learning models, deep learning architectures, graph-based baselines, and a strong LLM setup. The H-GNN achieves the best overall performance with both the highest test accuracy and F1 score, indicating that its hierarchical, metadata-enriched graph representation captures the structure of news articles more effectively than flat text models. Notably, it outperforms CNN/RNN baselines and other GNN variants such as GCN, GraphSAGE, and GAT, as well as the best LLM configuration, highlighting the advantages of explicitly modeling multi-level relationships for fake news detection.
| Model | Test Acc | Test F1 |
|---|---|---|
| Proposed H-GNN | 0.7738 | 0.7753 |
| LLM | 0.4528 | 0.4016 |
| CNN | 0.7074 | 0.7066 |
| RNN | 0.6944 | 0.6941 |
| GCN | 0.5860 | 0.5828 |
| GraphSAGE | 0.6423 | 0.6416 |
| GAT | 0.5783 | 0.5775 |
| Random Forest | 0.5596 | 0.4823 |
| SVM | 0.7043 | 0.7052 |
| Naive Bayes | 0.6474 | 0.6447 |
| Logistic Regression | 0.1500 | 0.1186 |
To improve interpretability, the model’s output can be exported into Gephi, where nodes (sentences, metadata, entities) are weighted by their learned importance during prediction. Using Gephi’s layout algorithms and styling tools, this produces an interactive, explorable visualization that reveals which parts of an article and which metadata relationships (e.g., source, author, topics) drive the model’s decision. High-influence nodes are highlighted through color intensity or size, allowing users to quickly identify the most influential semantic and contextual signals behind each prediction. This makes the hierarchical GNN not only accurate, but also transparent and inspectable.
The project includes a set of analysis tools designed to help explore the dataset, understand model behavior, and interpret the underlying graph structures. These tools support every stage of the workflow, from inspecting raw data and label distributions to evaluating model performance, visualizing learned attention patterns, and analyzing the topology of the constructed heterogeneous graphs.
This includes exploratory analysis of article characteristics such as length, structure, and linguistic complexity, alongside visualizations of label distribution and class balance. These insights help assess dataset quality, detect biases, and guide preprocessing decisions.
Model analysis focuses on evaluating performance across configurations and baselines, identifying common error patterns, and examining attention weights or feature contributions. This provides a deeper understanding of model behavior and highlights areas for improvement.
Graph analysis explores the structure of the constructed article networks, evaluates node importance through centrality and learned weights, and identifies communities or clusters within the graph. These insights reveal relational patterns that contribute to model predictions and domain understanding.
-
base_structure_enhanced.py- Core hierarchical GNN implementation with enhanced featuresHeteroGNN: Main heterogeneous GNN modelNamedEntityProcessor: Extracts and processes named entitiesAdvancedFeatureProcessor: Handles linguistic and statistical featuresBERTWordEmbedder: BERT-based word embeddingsMultiLevelPooling: Hierarchical pooling mechanisms
-
focused_baseline_models.py- Traditional ML baselines for comparison- Random Forest, SVM, Logistic Regression, Naive Bayes
- Advanced feature extraction (linguistic, sentiment, readability)
- Cross-validation and hyperparameter tuning
-
gnn_and_deep_models.py- Various GNN architectures- Graph Convolutional Networks (GCN)
- GraphSAGE
- Graph Attention Networks (GAT)
- Heterogeneous graph implementations
-
run_ablation_study.py- Comprehensive ablation studies- Feature ablation (removing specific features)
- Node type ablation (removing node types)
- Model component ablation
- Performance comparison across configurations
full_model_100000.pt- Model trained on 100k samplesfull_model_20000.pt- Model trained on 20k samples
-
dataset_eda.py- Exploratory Data Analysis- Dataset statistics and cleaning
- Content analysis (word counts, sentence counts)
- Label distribution analysis
- Visualization of dataset characteristics
-
performance_analysis.py- Model performance evaluation- Loads trained models and evaluates on test data
- Generates classification reports and confusion matrices
- Performance metrics across different configurations
-
focused_failure_analysis.py- Error analysis and debugging- Identifies misclassified samples
- Analyzes failure patterns
- Provides insights for model improvement
-
llm_zero_shot_analysis.py- LLM-based analysis using OpenAI GPT- Zero-shot classification with GPT models
- Comparison with GNN performance
- Error analysis using LLM reasoning
-
pooling_word_weight_analyzer.py- Attention and pooling analysis- Analyzes word weights in pooling layers
- Visualizes attention patterns
- Understands model decision-making
gephi_data_export.py- Graph visualization export- Exports graph data to Gephi-compatible formats
- Node and edge analysis
- Attention weight visualization
- Python 3.8+
- PyTorch 2.0+
- PyTorch Geometric
- CUDA-compatible GPU (recommended)


