Black Hole Strategy for Graph Sparsification in MOF Networks

Black Hole Strategy: A gravity-inspired graph sparsification approach for Metal-Organic Framework (MOF) networks.

Black Hole Strategy for Graph Sparsification in MOF Networks

The Black Hole (BH) strategy is a novel graph sparsification technique inspired by the gravitational pull of black holes, which condense matter into highly structured forms. BH retains the most influential nodes and edges in a network using a gravity-like scoring mechanism, preserving critical connections and community structures (via the Louvain algorithm). Unlike random pruning, BH ensures stable and accurate graph representations, even under extreme sparsity, making it ideal for downstream learning tasks in sparse regimes.

About the Project

The Black Hole Strategy is implemented within the MOFGalaxyNet framework, designed for analyzing Metal-Organic Framework (MOF) networks. It leverages weighted edge importance and community detection to sparsify graphs while maintaining structural integrity, outperforming traditional methods like random pruning or edge betweenness centrality.

For the full MOFGalaxyNet code, visit:
MehrdadJalali-AI/MOFGalaxyNet

For more details about the project and my work, visit my personal website:
www.mehrdadjalali.de

Key Features

Gravity-Inspired Sparsification: Uses a scoring mechanism to prioritize influential edges and nodes.
Community обслуживание: Integrates Louvain algorithm for robust community detection.
High Stability in Sparse Regimes: Maintains graph accuracy for downstream tasks like GraphSAGE training.
Comprehensive Analysis Tools: Includes modules for visualizing MOF properties (e.g., linker and metal distributions, pore-limiting diameters).

Repository Structure

Core Modules

File	Description
`main.py`	Orchestrates the pipeline: data loading, BH sparsification, GraphSAGE training.
`data_utils.py`	Loads and preprocesses data, generates RDKit fingerprints, and cleans features.
`graphsage_model.py`	Defines and trains the GraphSAGE neural network.
`bh_sparsification.py`	Implements the Black Hole sparsification algorithm.
`experiment_manager.py`	Manages experiments, logging, checkpoints, and result storage.
`sparsification_methods.py`	Provides alternative sparsification methods for comparison (e.g., random pruning).

Analysis & Visualization Modules

File	Description
`LinkerDistribuation.py`	Visualizes the distribution of organic linkers in MOFs.
`MetalDistribuation.py`	Visualizes the distribution of metals in MOFs.
`PLDdistribuation.py`	Visualizes pore-limiting diameter (PLD) distributions in MOFs.
`analyze_sparsified_graphs_v2.py`	Analyzes properties of sparsified graphs across methods and thresholds.
`PlotNetworkParameters.py`	Plots network metrics (e.g., degree distribution, modularity).
`PlotParameters.py`	Plots sparsification metrics (accuracy, modularity, runtime).
`Plot_Redundency.py`	Visualizes redundancy and overlap metrics across sparsification thresholds.
`plot_network_metrics.py`	Generates plots for network metrics (e.g., density, clustering).
`Performance_Frugal_overall.py`	Benchmarks and summarizes performance across sparsification methods.

Data Files

MOFGalaxyNet.csv: Edge list for MOF network (829,300 edges, not included).
MOFCSD.csv: Node features for MOFs (12,561 nodes, not included).
sparsification_performance.csv: Stores evaluation metrics (accuracy, modularity, runtime).
BH.jpg & Animated_BH_txt_shorter.gif: Visuals for README.

Installation

Prerequisites

Python 3.9
Conda
Required packages: pytorch==2.4.0, pandas, numpy, networkx, scikit-learn, rdkit>=2024.03, psutil, tqdm

Setup Instructions

Create a Conda environment:

conda create -n bh_env python=3.9
conda activate bh_env

Install dependencies:

conda install pytorch==2.4.0 pandas numpy networkx scikit-learn rdkit psutil -c pytorch -c conda-forge
pip install tqdm

Clone the repository:

git clone https://github.com/MehrdadJalali-KIT/black-hole-strategy.git
cd black-hole-strategy

Note: For RDKit >=2024.03, update data_utils.py to use MorganGenerator.GetFingerprintAsNumPy to avoid deprecation warnings. See Troubleshooting for details.

Usage

Prepare data: Place MOFGalaxyNet.csv, MOFCSD.csv, BH.jpg, and Animated_BH_txt_shorter.gif in the project root.

Run the pipeline:

conda activate bh_env
rm -rf __pycache__ *.pyc
python main.py

Monitor progress:
```
tail -f bh_evaluation.log
```

Expected Output

Logs: Feature generation ([12561, 1031]), training progress (e.g., Epoch X, Loss: Y, Train Accuracy: Z), and test accuracy (0.6–0.8).
Results: Stored in evaluation/threshold_0.90/method_{blackhole,random}/run_0/model_results_with_error_bars.csv.
Run Time: ~10 minutes on a typical CPU (e.g., Apple Silicon).

Optional: Test Without Edge Weights

To match previous versions, edit main.py:

use_edge_weights = False

Then run:

python main.py

Troubleshooting

Check Logs:
```
tail -f bh_evaluation.log
```
Verify: Feature shape [12561, 1031], test accuracy >0.5, no feature shape mismatches.

Validate Data:

import pandas as pd
summary = pd.read_csv('MOFCSD.csv')
print(f"Invalid SMILES: {(summary['linker SMILES'] == 'F[Si](F)(F)(F)(F)F').sum()}")
print(f"Metals: {summary['metal'].value_counts()}")
print(f"NaNs: {summary[['Pore Limiting Diameter', 'Largest Cavity Diameter', 'Largest Free Sphere']].isna().sum()}")

Test Data Loading:

from data_utils import load_summary_data
edges = pd.read_csv('MOFGalaxyNet.csv')
nodes = pd.concat([edges['source'], edges['target']]).unique()
features_df, summary_data = load_summary_data('MOFCSD.csv', nodes)
print(features_df.shape)  # Should be (12561, 1031)

RDKit Version:
```
python -c "from rdkit import __version__; print(__version__)"
```
If <2024.03, update RDKit and modify data_utils.py to use MorganGenerator.

Known Issues

RDKit Deprecation: Older RDKit versions (<2024.03) trigger warnings for AllChem.GetMorganFingerprintAsBitVect. Update RDKit and use MorganGenerator.
Invalid SMILES: F[Si](F)(F)(F)(F)F in MOFCSD.csv is replaced with c1ccccc1 (benzene) during preprocessing.
Low Accuracy: Accuracy may be lower than previous versions (>0.5). Test with use_edge_weights = False.
Modularity: Black Hole partition may fall back to Louvain (modularity ~0.4075) if invalid.

Contributing

Contributions are welcome! Please submit issues or pull requests to:

Fix invalid SMILES in MOFCSD.csv.
Enhance Black Hole partition modularity in bh_sparsification.py.
Optimize GraphSAGE for better accuracy with edge weights.

License

This project is licensed under the MIT License. See LICENSE for details.

Contact

For questions or collaboration, reach out via GitHub or visit my website for more information: www.mehrdadjalali.de

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
plots_all_methods		plots_all_methods
sparsified_graphs		sparsified_graphs
.DS_Store		.DS_Store
Animated_BH_Shorter.gif		Animated_BH_Shorter.gif
Animated_BH_txt_shorter.gif		Animated_BH_txt_shorter.gif
BH.jpg		BH.jpg
BH2.png		BH2.png
BH_Animated.gif		BH_Animated.gif
CHANGES-V4.4.txt		CHANGES-V4.4.txt
DiffentAlphaBetaGamma.csv		DiffentAlphaBetaGamma.csv
LinkerDistribuation.py		LinkerDistribuation.py
LinkerDistribuation_5Least.py		LinkerDistribuation_5Least.py
MOFCSD.csv		MOFCSD.csv
MOFGalaxyNet.csv		MOFGalaxyNet.csv
MetalDistribuation.py		MetalDistribuation.py
MetalDistribuation_5Least.py		MetalDistribuation_5Least.py
Modularity.py		Modularity.py
PLDdistribuation.py		PLDdistribuation.py
PLots.ipynb		PLots.ipynb
Performance_Frugal.py		Performance_Frugal.py
Performance_Frugal_overall.py		Performance_Frugal_overall.py
PlotAccuracy.py		PlotAccuracy.py
PlotAccuracyALL.py		PlotAccuracyALL.py
PlotDegreDistribation.py		PlotDegreDistribation.py
PlotNetworkParameters.py		PlotNetworkParameters.py
PlotNetworkParameters_OnlyBH.py		PlotNetworkParameters_OnlyBH.py
PlotParameters.py		PlotParameters.py
Plot_Redundency.py		Plot_Redundency.py
PlotsConfusion.py		PlotsConfusion.py
README.md		README.md
Redundency_aggregated_results.csv		Redundency_aggregated_results.csv
RegressionPlots.py		RegressionPlots.py
Results_Evaluation_v2.ipynb		Results_Evaluation_v2.ipynb
aggregated_results.csv		aggregated_results.csv
aggregated_results3.csv		aggregated_results3.csv
aggregated_results_v2.csv		aggregated_results_v2.csv
analyze_sparsified_graphs.py		analyze_sparsified_graphs.py
analyze_sparsified_graphs_v2.py		analyze_sparsified_graphs_v2.py
bh_evaluation.log		bh_evaluation.log
bh_sparsification.py		bh_sparsification.py
data_utils.py		data_utils.py
experiment_manager.py		experiment_manager.py
fixed_test_nodes.csv		fixed_test_nodes.csv
graphsage_model.py		graphsage_model.py
main.py		main.py
plot_network_metrics.py		plot_network_metrics.py
requirements.txt		requirements.txt
reset_checkpoint.py		reset_checkpoint.py
results_aggregated.csv		results_aggregated.csv
results_aggregated3.csv		results_aggregated3.csv
results_regression_aggregated.csv		results_regression_aggregated.csv
sparsification_analysis_fixed.ipynb		sparsification_analysis_fixed.ipynb
sparsification_methods.py		sparsification_methods.py
sparsification_performance.csv		sparsification_performance.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Black Hole Strategy for Graph Sparsification in MOF Networks

About the Project

Key Features

Repository Structure

Core Modules

Analysis & Visualization Modules

Data Files

Installation

Prerequisites

Setup Instructions

Usage

Expected Output

Optional: Test Without Edge Weights

Troubleshooting

Known Issues

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Black Hole Strategy for Graph Sparsification in MOF Networks

About the Project

Key Features

Repository Structure

Core Modules

Analysis & Visualization Modules

Data Files

Installation

Prerequisites

Setup Instructions

Usage

Expected Output

Optional: Test Without Edge Weights

Troubleshooting

Known Issues

Contributing

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages