Skip to content

MehrdadJalali-AI/BlackHole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

128 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Black Hole Strategy in MOF Networks

Black Hole Strategy: A gravity-inspired graph sparsification approach for Metal-Organic Framework (MOF) networks.

Black Hole Strategy for Graph Sparsification in MOF Networks

The Black Hole (BH) strategy is a novel graph sparsification technique inspired by the gravitational pull of black holes, which condense matter into highly structured forms. BH retains the most influential nodes and edges in a network using a gravity-like scoring mechanism, preserving critical connections and community structures (via the Louvain algorithm). Unlike random pruning, BH ensures stable and accurate graph representations, even under extreme sparsity, making it ideal for downstream learning tasks in sparse regimes.

Animation of Black Hole Sparsification

About the Project

The Black Hole Strategy is implemented within the MOFGalaxyNet framework, designed for analyzing Metal-Organic Framework (MOF) networks. It leverages weighted edge importance and community detection to sparsify graphs while maintaining structural integrity, outperforming traditional methods like random pruning or edge betweenness centrality.

For the full MOFGalaxyNet code, visit:
MehrdadJalali-AI/MOFGalaxyNet

For more details about the project and my work, visit my personal website:
www.mehrdadjalali.de

Key Features

  • Gravity-Inspired Sparsification: Uses a scoring mechanism to prioritize influential edges and nodes.
  • Community обслуживание: Integrates Louvain algorithm for robust community detection.
  • High Stability in Sparse Regimes: Maintains graph accuracy for downstream tasks like GraphSAGE training.
  • Comprehensive Analysis Tools: Includes modules for visualizing MOF properties (e.g., linker and metal distributions, pore-limiting diameters).

Repository Structure

Core Modules

File Description
main.py Orchestrates the pipeline: data loading, BH sparsification, GraphSAGE training.
data_utils.py Loads and preprocesses data, generates RDKit fingerprints, and cleans features.
graphsage_model.py Defines and trains the GraphSAGE neural network.
bh_sparsification.py Implements the Black Hole sparsification algorithm.
experiment_manager.py Manages experiments, logging, checkpoints, and result storage.
sparsification_methods.py Provides alternative sparsification methods for comparison (e.g., random pruning).

Analysis & Visualization Modules

File Description
LinkerDistribuation.py Visualizes the distribution of organic linkers in MOFs.
MetalDistribuation.py Visualizes the distribution of metals in MOFs.
PLDdistribuation.py Visualizes pore-limiting diameter (PLD) distributions in MOFs.
analyze_sparsified_graphs_v2.py Analyzes properties of sparsified graphs across methods and thresholds.
PlotNetworkParameters.py Plots network metrics (e.g., degree distribution, modularity).
PlotParameters.py Plots sparsification metrics (accuracy, modularity, runtime).
Plot_Redundency.py Visualizes redundancy and overlap metrics across sparsification thresholds.
plot_network_metrics.py Generates plots for network metrics (e.g., density, clustering).
Performance_Frugal_overall.py Benchmarks and summarizes performance across sparsification methods.

Data Files

  • MOFGalaxyNet.csv: Edge list for MOF network (829,300 edges, not included).
  • MOFCSD.csv: Node features for MOFs (12,561 nodes, not included).
  • sparsification_performance.csv: Stores evaluation metrics (accuracy, modularity, runtime).
  • BH.jpg & Animated_BH_txt_shorter.gif: Visuals for README.

Installation

Prerequisites

  • Python 3.9
  • Conda
  • Required packages: pytorch==2.4.0, pandas, numpy, networkx, scikit-learn, rdkit>=2024.03, psutil, tqdm

Setup Instructions

  1. Create a Conda environment:
    conda create -n bh_env python=3.9
    conda activate bh_env
  2. Install dependencies:
    conda install pytorch==2.4.0 pandas numpy networkx scikit-learn rdkit psutil -c pytorch -c conda-forge
    pip install tqdm
  3. Clone the repository:
    git clone https://github.com/MehrdadJalali-KIT/black-hole-strategy.git
    cd black-hole-strategy

Note: For RDKit >=2024.03, update data_utils.py to use MorganGenerator.GetFingerprintAsNumPy to avoid deprecation warnings. See Troubleshooting for details.

Usage

  1. Prepare data: Place MOFGalaxyNet.csv, MOFCSD.csv, BH.jpg, and Animated_BH_txt_shorter.gif in the project root.
  2. Run the pipeline:
    conda activate bh_env
    rm -rf __pycache__ *.pyc
    python main.py
  3. Monitor progress:
    tail -f bh_evaluation.log

Expected Output

  • Logs: Feature generation ([12561, 1031]), training progress (e.g., Epoch X, Loss: Y, Train Accuracy: Z), and test accuracy (0.6–0.8).
  • Results: Stored in evaluation/threshold_0.90/method_{blackhole,random}/run_0/model_results_with_error_bars.csv.
  • Run Time: ~10 minutes on a typical CPU (e.g., Apple Silicon).

Optional: Test Without Edge Weights

To match previous versions, edit main.py:

use_edge_weights = False

Then run:

python main.py

Troubleshooting

  1. Check Logs:
    tail -f bh_evaluation.log

    Verify: Feature shape [12561, 1031], test accuracy >0.5, no feature shape mismatches.

  2. Validate Data:
    import pandas as pd
    summary = pd.read_csv('MOFCSD.csv')
    print(f"Invalid SMILES: {(summary['linker SMILES'] == 'F[Si](F)(F)(F)(F)F').sum()}")
    print(f"Metals: {summary['metal'].value_counts()}")
    print(f"NaNs: {summary[['Pore Limiting Diameter', 'Largest Cavity Diameter', 'Largest Free Sphere']].isna().sum()}")
  3. Test Data Loading:
    from data_utils import load_summary_data
    edges = pd.read_csv('MOFGalaxyNet.csv')
    nodes = pd.concat([edges['source'], edges['target']]).unique()
    features_df, summary_data = load_summary_data('MOFCSD.csv', nodes)
    print(features_df.shape)  # Should be (12561, 1031)
  4. RDKit Version:
    python -c "from rdkit import __version__; print(__version__)"

    If <2024.03, update RDKit and modify data_utils.py to use MorganGenerator.

Known Issues

  • RDKit Deprecation: Older RDKit versions (<2024.03) trigger warnings for AllChem.GetMorganFingerprintAsBitVect. Update RDKit and use MorganGenerator.
  • Invalid SMILES: F[Si](F)(F)(F)(F)F in MOFCSD.csv is replaced with c1ccccc1 (benzene) during preprocessing.
  • Low Accuracy: Accuracy may be lower than previous versions (>0.5). Test with use_edge_weights = False.
  • Modularity: Black Hole partition may fall back to Louvain (modularity ~0.4075) if invalid.

Contributing

Contributions are welcome! Please submit issues or pull requests to:

  • Fix invalid SMILES in MOFCSD.csv.
  • Enhance Black Hole partition modularity in bh_sparsification.py.
  • Optimize GraphSAGE for better accuracy with edge weights.

License

This project is licensed under the MIT License. See LICENSE for details.

Contact

For questions or collaboration, reach out via GitHub or visit my website for more information: www.mehrdadjalali.de

About

Python-based Black Hole Strategy as Frugal Graph Learning for node removal in MOF graphs using gravity metrics and community detection

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors