LFN Graph Embeddings (project APEROL)

LFN project, 2025.

Introduction

Read the first proposal.
Read the midterm report.
Read the final report.

Instructions

Requirements

Dependencies

See the dedicated file and install the dependencies inside a virtual environment. Additionaly, the package torch-cluster needs to be installed in order to use Node2Vec:

pip install torch-cluster

The program has been tested on GPU using these additional packages with specific versions:

Pytorch 2.8.0, CUDA 1.28
cupy-cuda12x

torch-cluster needs to be reinstalled with its gpu-adiacent version:

pip install torch_cluster -f https://data.pyg.org/whl/torch-2.8.0+cu128.html

Hardware

The program runs on a single NVIDIA GPU, if detected; otherwise, it can run on cpu with good performance for small and medium-sized datasets.

For the few who have access to the cluster "Blade", a .def file is provided so that you can build the container starting from an already existing image cv-ml-torch.sif obtainable from inside the cluster. Before building, please change the path of the localimage in the .def file.

Running the project

Extract the archive ./datasets/original_dataset.zip containing the datasets we used.
Run the preprocessing of the datasets:
```
python src/dataset_preprocessing.py
```

Run the program:

python src/embeddings_pipeline.py

You can also choose to run the pipeline with a single dataset, as well as a single embedding algorithm and a single downstream model. One example:

python src/embeddings_pipeline.py --data datasets/processed_datasets/Bio_grid_fission_yeast.csv

Or even:

python src/embeddings_pipeline.py --data Bio_grid_fission_yeast --embed DVNE --model MLP

Datasets and project structure

9 datasets of different sizes are used, ranging from ~25k edges to ~3M edges. You can check the references for each dataset in the midterm report.

Datasets full details

Network	Nodes	Edges	Type	Link to the dataset page
Pennsylvania	1,088,092	3,083,796	Directed	http://snap.stanford.edu/data/roadNet-PA.html
Padua (province)	122,680	304,184	Directed	https://github.com/Remdox/Padua_Network_dataset_2025
Hong Kong (city)	43,620	91,542	Directed	https://github.com/yzengal/RoadNetwork-China-City/blob/main/Hongkong.road-d.tar.gz
Italian Covid-19 Retweet Network	221,574	800,000	Directed	https://zenodo.org/records/13909011
Deezer	143,884	846,915	Undirected	https://snap.stanford.edu/data/gemsec-Deezer.html
GitHub Developers	37,700	289,003	Undirected	http://snap.stanford.edu/data/github-social.html
Mus Musculus Protein Interactions (confidence score >0.7, only AB)	20,969	800,000	Undirected	https://string-db.org/cgi/download?sessionId=b9zuGHnAZu39&species_text=Mus+musculus&settings_expanded=1&min_download_score=400&filter_redundant_pairs=1&delimiter_type=csv
Saccharomyces cerevisiae Protein Interactions (confidence score >0.4, only AB)	5,786	100,000	Undirected	https://string-db.org/cgi/download?sessionId=b9zuGHnAZu39&species_text=Saccharomyces+cerevisiae&settings_expanded=1&min_download_score=700&filter_redundant_pairs=1&delimiter_type=csv
Bio-grid-fission-yeast	2,000	25,300	Undirected	https://networkrepository.com/bio-grid-fission-yeast.php

Project Structure

The project structure is defined as:

LFN_Graph_Embeddings/
├── datasets/
│   └── datasets_info.csv
│   └── original_datasets.zip
└── include/
│   └── graphsage/
│   └── line/
│   └── node2vec/
|   └── svm/
└── reports/
│   └── final_report/
│   |   └── final_report.pdf
│   |   └── final_report.tex
│   │   └──   ...
│   └── first_proposal/
│   |   └── first_proposal.pdf
│   |   └── first_proposal.tex
│   │   └──   ...
|   └── midterm_report/
│   |   └── midterm_report.pdf
│   |   └── midterm_report.tex
│   │   └──   ...
└── src/
│   └── datasets_preprocessing.py
│   └── dataset_utils.py
│   └── embeddings.py
│   └── models.py
│   └── pipeline.py
│   └── pipeline_utils.py
│   └── utils.py
└── README-md

Where The datasets_info.csv file provides the fields considered for each of the datasets used. If a new dataset is included, this file has to be correctly updated before running the program.

Results

See the final report.

Credits

We thank the creators of the following implementations (see the include folder inside the project):

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
datasets		datasets
include		include
reports		reports
src		src
.gitignore		.gitignore
LFN_container_gpu.def		LFN_container_gpu.def
README.md		README.md
requirements.txt		requirements.txt
results.xlsx		results.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LFN Graph Embeddings (project APEROL)

Table of Contents

Introduction

Instructions

Requirements

Dependencies

Hardware

Running the project

Datasets and project structure

Datasets full details

Project Structure

Results

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LFN Graph Embeddings (project APEROL)

Table of Contents

Introduction

Instructions

Requirements

Dependencies

Hardware

Running the project

Datasets and project structure

Datasets full details

Project Structure

Results

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages