This repository contains a dataset of multilayer networks and the spreading potentials of their
actors. It also includes a Python package to facilitate the data loading process. The dataset is one
of the artefacts described in the paper
Identifying Super Spreaders in Multilayer Networks.
- Authors: Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka
- Affiliation: Wrocław University of Science and Technology, Wrocław, Lower Silesia, Poland
The dataset comprises over 200 multilayer networks, including both synthetic and real-world examples. Each actor is labelled according to their spreading capability, assessed through simulation. Specifically, for every actor, a diffusion under the Multilayer Independent Cascade Model is initiated with that actor as the sole seed. From each simulation, a feature vector is extracted, containing:
- the total number of activated nodes,
- the duration of the diffusion process (i.e. number of time steps),
- the maximum number of activations in a single step (the peak),
- the time step at which this peak occurs.
A summary of the networks included in the dataset, along with their key statistics, is provided below. For synthetic networks, mean values are reported across all instances.
Network type | Layers | Actors | Nodes | Edges | Degree |
---|---|---|---|---|---|
artificial-er | 3.52 | 558.19 | 1741.70 | 6684.00 | 24.13 |
artificial-pa | 3.52 | 574.51 | 1976.07 | 42636.53 | 122.10 |
artificial-small | 2.75 | 1000.00 | 2750.00 | 6609.12 | 13.22 |
arxiv | 13 | 14065 | 26796 | 59026 | 8.39 |
aucs | 5 | 61 | 224 | 620 | 20.33 |
ckmp | 3 | 241 | 674 | 1370 | 11.37 |
eu-trans | 37 | 417 | 2034 | 3588 | 17.21 |
l2-course | 2 | 41 | 82 | 297 | 14.49 |
lazega | 3 | 71 | 212 | 1659 | 46.73 |
timik | 3 | 61702 | 102247 | 875191 | 28.37 |
.
├── .dvc -> DVC configuration files
├── env -> Env. requirements for the Python package
├── tsds_utils -> Python package for handling the dataset
├── tsds_sources -> Directory with source data files
└── README.md
Version 2.0.0 - a dataset in the exact form that was used in experiments Version 2.0.1 - a variant with improved manual
The dataset is managed using DVC. To use it, first install the required
dependencies listed in requirements.txt
, including the appropriate version of DVC.
To download the dataset, you must authenticate with a Google account that has access to the shared
Google Drive storage: https://drive.google.com/drive/folders/0ACmD69K7LbU3Uk9PVA
. If you need
access, please contact one of the contributors. Then, to fetch the data, run dvc pull
.
A public DVC configuration for the dataset version used in the paper is available at:
https://drive.google.com/file/d/1OnNLhjKotOlV0c2_1FcJgCCzQdTGRtoQ
. To use it, unpack the archive,
and move its contents into the .dvc
directory of this project. Then, execute: dvc checkout
.
To install the package in editable mode, run:
pip install -e .
To contribute to the repository, create and activate the development environment using Conda:
conda env create -f env/conda.yaml
conda activate tsds-utils
This work was supported by the National Science Centre, Poland [grant no. 2022/45/B/ST6/04145] (www.multispread.pwr.edu.pl); the Polish Ministry of Science and Higher Education programme “International Projects Co-Funded”; and the EU under the Horizon Europe [grant no. 101086321]. Views and opinions expressed are those of the authors and do not necessarily reflect those of the funding agencies.