LFN project, 2025.
Read the first proposal.
Read the midterm report.
Read the final report.
See the dedicated file and install the dependencies inside a virtual environment. Additionaly, the package torch-cluster needs to be installed in order to use Node2Vec:
pip install torch-cluster
The program has been tested on GPU using these additional packages with specific versions:
- Pytorch 2.8.0, CUDA 1.28
- cupy-cuda12x
- torch-cluster needs to be reinstalled with its gpu-adiacent version:
pip install torch_cluster -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
The program runs on a single NVIDIA GPU, if detected; otherwise, it can run on cpu with good performance for small and medium-sized datasets.
For the few who have access to the cluster "Blade", a .def file is provided so that you can build the container starting from an already existing image cv-ml-torch.sif obtainable from inside the cluster. Before building, please change the path of the localimage in the .def file.
-
Extract the archive ./datasets/original_dataset.zip containing the datasets we used.
-
Run the preprocessing of the datasets:
python src/dataset_preprocessing.py
-
Run the program:
python src/embeddings_pipeline.py
You can also choose to run the pipeline with a single dataset, as well as a single embedding algorithm and a single downstream model. One example:
python src/embeddings_pipeline.py --data datasets/processed_datasets/Bio_grid_fission_yeast.csv
Or even:
python src/embeddings_pipeline.py --data Bio_grid_fission_yeast --embed DVNE --model MLP
9 datasets of different sizes are used, ranging from ~25k edges to ~3M edges. You can check the references for each dataset in the midterm report.
The project structure is defined as:
LFN_Graph_Embeddings/
├── datasets/
│ └── datasets_info.csv
│ └── original_datasets.zip
└── include/
│ └── graphsage/
│ └── line/
│ └── node2vec/
| └── svm/
└── reports/
│ └── final_report/
│ | └── final_report.pdf
│ | └── final_report.tex
│ │ └── ...
│ └── first_proposal/
│ | └── first_proposal.pdf
│ | └── first_proposal.tex
│ │ └── ...
| └── midterm_report/
│ | └── midterm_report.pdf
│ | └── midterm_report.tex
│ │ └── ...
└── src/
│ └── datasets_preprocessing.py
│ └── dataset_utils.py
│ └── embeddings.py
│ └── models.py
│ └── pipeline.py
│ └── pipeline_utils.py
│ └── utils.py
└── README-md
Where The datasets_info.csv file provides the fields considered for each of the datasets used. If a new dataset is included, this file has to be correctly updated before running the program.
See the final report.
We thank the creators of the following implementations (see the include folder inside the project):