Black Hole Strategy: A gravity-inspired graph sparsification approach for Metal-Organic Framework (MOF) networks.
The Black Hole (BH) strategy is a novel graph sparsification technique inspired by the gravitational pull of black holes, which condense matter into highly structured forms. BH retains the most influential nodes and edges in a network using a gravity-like scoring mechanism, preserving critical connections and community structures (via the Louvain algorithm). Unlike random pruning, BH ensures stable and accurate graph representations, even under extreme sparsity, making it ideal for downstream learning tasks in sparse regimes.
The Black Hole Strategy is implemented within the MOFGalaxyNet framework, designed for analyzing Metal-Organic Framework (MOF) networks. It leverages weighted edge importance and community detection to sparsify graphs while maintaining structural integrity, outperforming traditional methods like random pruning or edge betweenness centrality.
For the full MOFGalaxyNet code, visit:
MehrdadJalali-AI/MOFGalaxyNet
For more details about the project and my work, visit my personal website:
www.mehrdadjalali.de
- Gravity-Inspired Sparsification: Uses a scoring mechanism to prioritize influential edges and nodes.
- Community обслуживание: Integrates Louvain algorithm for robust community detection.
- High Stability in Sparse Regimes: Maintains graph accuracy for downstream tasks like GraphSAGE training.
- Comprehensive Analysis Tools: Includes modules for visualizing MOF properties (e.g., linker and metal distributions, pore-limiting diameters).
| File | Description |
|---|---|
main.py |
Orchestrates the pipeline: data loading, BH sparsification, GraphSAGE training. |
data_utils.py |
Loads and preprocesses data, generates RDKit fingerprints, and cleans features. |
graphsage_model.py |
Defines and trains the GraphSAGE neural network. |
bh_sparsification.py |
Implements the Black Hole sparsification algorithm. |
experiment_manager.py |
Manages experiments, logging, checkpoints, and result storage. |
sparsification_methods.py |
Provides alternative sparsification methods for comparison (e.g., random pruning). |
| File | Description |
|---|---|
LinkerDistribuation.py |
Visualizes the distribution of organic linkers in MOFs. |
MetalDistribuation.py |
Visualizes the distribution of metals in MOFs. |
PLDdistribuation.py |
Visualizes pore-limiting diameter (PLD) distributions in MOFs. |
analyze_sparsified_graphs_v2.py |
Analyzes properties of sparsified graphs across methods and thresholds. |
PlotNetworkParameters.py |
Plots network metrics (e.g., degree distribution, modularity). |
PlotParameters.py |
Plots sparsification metrics (accuracy, modularity, runtime). |
Plot_Redundency.py |
Visualizes redundancy and overlap metrics across sparsification thresholds. |
plot_network_metrics.py |
Generates plots for network metrics (e.g., density, clustering). |
Performance_Frugal_overall.py |
Benchmarks and summarizes performance across sparsification methods. |
MOFGalaxyNet.csv: Edge list for MOF network (829,300 edges, not included).MOFCSD.csv: Node features for MOFs (12,561 nodes, not included).sparsification_performance.csv: Stores evaluation metrics (accuracy, modularity, runtime).BH.jpg&Animated_BH_txt_shorter.gif: Visuals for README.
- Python 3.9
- Conda
- Required packages:
pytorch==2.4.0,pandas,numpy,networkx,scikit-learn,rdkit>=2024.03,psutil,tqdm
- Create a Conda environment:
conda create -n bh_env python=3.9 conda activate bh_env - Install dependencies:
conda install pytorch==2.4.0 pandas numpy networkx scikit-learn rdkit psutil -c pytorch -c conda-forge pip install tqdm - Clone the repository:
git clone https://github.com/MehrdadJalali-KIT/black-hole-strategy.git cd black-hole-strategy
Note: For RDKit >=2024.03, update data_utils.py to use MorganGenerator.GetFingerprintAsNumPy to avoid deprecation warnings. See Troubleshooting for details.
- Prepare data: Place
MOFGalaxyNet.csv,MOFCSD.csv,BH.jpg, andAnimated_BH_txt_shorter.gifin the project root. - Run the pipeline:
conda activate bh_env rm -rf __pycache__ *.pyc python main.py - Monitor progress:
tail -f bh_evaluation.log
- Logs: Feature generation (
[12561, 1031]), training progress (e.g.,Epoch X, Loss: Y, Train Accuracy: Z), and test accuracy (0.6–0.8). - Results: Stored in
evaluation/threshold_0.90/method_{blackhole,random}/run_0/model_results_with_error_bars.csv. - Run Time: ~10 minutes on a typical CPU (e.g., Apple Silicon).
To match previous versions, edit main.py:
use_edge_weights = FalseThen run:
python main.py- Check Logs:
tail -f bh_evaluation.logVerify: Feature shape
[12561, 1031], test accuracy >0.5, no feature shape mismatches. - Validate Data:
import pandas as pd summary = pd.read_csv('MOFCSD.csv') print(f"Invalid SMILES: {(summary['linker SMILES'] == 'F[Si](F)(F)(F)(F)F').sum()}") print(f"Metals: {summary['metal'].value_counts()}") print(f"NaNs: {summary[['Pore Limiting Diameter', 'Largest Cavity Diameter', 'Largest Free Sphere']].isna().sum()}") - Test Data Loading:
from data_utils import load_summary_data edges = pd.read_csv('MOFGalaxyNet.csv') nodes = pd.concat([edges['source'], edges['target']]).unique() features_df, summary_data = load_summary_data('MOFCSD.csv', nodes) print(features_df.shape) # Should be (12561, 1031) - RDKit Version:
python -c "from rdkit import __version__; print(__version__)"If <2024.03, update RDKit and modify
data_utils.pyto useMorganGenerator.
- RDKit Deprecation: Older RDKit versions (<2024.03) trigger warnings for
AllChem.GetMorganFingerprintAsBitVect. Update RDKit and useMorganGenerator. - Invalid SMILES:
F[Si](F)(F)(F)(F)FinMOFCSD.csvis replaced withc1ccccc1(benzene) during preprocessing. - Low Accuracy: Accuracy may be lower than previous versions (>0.5). Test with
use_edge_weights = False. - Modularity: Black Hole partition may fall back to Louvain (modularity ~0.4075) if invalid.
Contributions are welcome! Please submit issues or pull requests to:
- Fix invalid SMILES in
MOFCSD.csv. - Enhance Black Hole partition modularity in
bh_sparsification.py. - Optimize GraphSAGE for better accuracy with edge weights.
This project is licensed under the MIT License. See LICENSE for details.
For questions or collaboration, reach out via GitHub or visit my website for more information: www.mehrdadjalali.de
