You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-20Lines changed: 30 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,15 +9,14 @@ Pang is free software: you can redistribute it and/or modify it under the terms
9
9
# Description
10
10
Pang is an algorithm which represents and classifies a collection of graphs according to their frequent patterns (subgraphs).
11
11
12
-
The data available are in the FOPPA repository, they are extracted from FOPPA, a database of French public procurement notices [[P'22](#references)].
13
-
14
12
15
13
# Organization
16
14
This repository is composed of the following elements:
17
15
*`requirements.txt` : List of Python packages used in pang.py.
18
16
*`PANG.py` : Python script in order to use the algorithm.
17
+
*`EMCL.py` : Python script in order to compute the results of the experiments of the ECML paper.
19
18
*`ProcessingPattern.py` : Python script in order to compute the number of occurences and the set of induced patterns
20
-
*`data` : folder with the input files needed.
19
+
*`data` : folder with the input data files. There is one folder for each dataset, which are described in the [Datasets](#datasets) section.
21
20
22
21
23
22
# Installation
@@ -27,19 +26,17 @@ You first need to install `python` and the required packages:
27
26
2. Download this project from GitHub and unzip.
28
27
3. Execute `pip install -r requirements.txt` to install the required packages (see also the *Dependencies* Section).
29
28
30
-
The source code of SPMF in order to use gSpan and cgSpan is available [here](https://www.philippe-fournier-viger.com/spmf/index.php?link=download.php)
31
-
32
-
The VF2 and ISMAGS algortihms are included in the [`Networkx` library](https://networkx.org/)
33
-
34
-
For the baselines:
35
-
* The WL and WLOA algorithms are included in the Grakel library, available [here](https://ysig.github.io/GraKeL/0.1a8/benchmarks.html)
36
-
* Graph2Vec is included in the karateclub library, available [here](https://karateclub.readthedocs.io/en/latest/)
37
-
* DGCNN is included in the stellargraph library, available [here](https://stellargraph.readthedocs.io/en/stable/).
38
-
* We use the implementation of CORK from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive.
29
+
The source code of SPMF in order to use gSpan and cgSpan is available [here](https://www.philippe-fournier-viger.com/spmf/index.php?link=download.php).
30
+
SPMF is available in two versions:
31
+
* a jar file that can be run from the command line. Actually, this version can be use with gSpan, but not with cgSpan.
32
+
* a source code. The installation of this version is more complicated, but it allows to use cgSpan. You can find the instructions [here](https://www.philippe-fournier-viger.com/spmf/how_to_install.php).
39
33
34
+
In order to use Pang, you need to unzip the datasets in each folder of the `data` folder.
40
35
41
36
# Use
42
-
37
+
We provide two scripts to use Pang:
38
+
*`ECML.py` : a python script in order to compute the results of the ECML paper.
39
+
*`PANG.py` : a python script in order to use Pang with your own data.
43
40
44
41
## To Replicate the Paper Experiments
45
42
In order to use Pang:
@@ -52,19 +49,16 @@ The script will compute the results of the experiments and save the results asso
52
49
## To Apply PANG to Other Data
53
50
If you want to use Pang with your own data, you need to create an `XXX` folder in the `data` folder and put your data in it. This folder must contain the following files:
54
51
*`XXX_graph.txt` : a file containing the graphs.
55
-
*`XXX_pattern.txt` : a file containing the patterns.
56
52
*`XXX_label.txt` : a file containing the labels of the graphs.
57
53
58
54
Then you need to run a script to produce the data files that will be used by Pang:
59
55
1. Open the Python console.
60
-
2. Run `ProcessingPattern.py`with the option `-d XXX` in order to create the files `XXX_mono.txt` and `XXX_iso.txt`.
61
-
3. Run `PANG.py` with the option `-d XXX` in order to run Pang on the data `XXX`.
56
+
2. Run the script `Patterns.sh` in order to create the files `XXX_patterns.txt`.
57
+
3. Run `ProcessingPattern.py`with the option `-d XXX` in order to create the files `XXX_mono.txt` and `XXX_iso.txt`.
58
+
4. Run `PANG.py` with the option `-d XXX` in order to run Pang on the data `XXX`.
62
59
63
60
For each value of the parameter `k`, Pang will create a file `KResults.txt` containing the results of the classification and a file `KPatterns.txt` containing the patterns.
64
61
65
-
## To Reconstruct all patterns for a dataset
66
-
If you want to reconstruct all patterns using SPMF for a dataset, you need to run the associated script in the `scripts` folder. This process involves pattern mining and post processing for each pattern and is therefore time consuming. Each file will be saved in the associated `data` folder.
67
-
68
62
## Data Format
69
63
We use the same format as SPMF for the graph input files. Each graph is defined as follows:
70
64
@@ -76,7 +70,13 @@ For the patterns output files, each pattern contains one more line than the grap
76
70
77
71
4.`x A B C A,B,C` : graphs containing the pattern
78
72
79
-
73
+
## Datasets
74
+
The datasets used in the paper are available in the `data` folder. The following datasets are available:
75
+
*`MUTAG` : MUTAG dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
76
+
*`NCI1` : NCI1 dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
77
+
*`PTC` : PTC dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
78
+
*`DD` : DD dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
79
+
*`FOPPA` : dataset extracted from FOPPA, a database of French public procurement notices [[P'22](#references)].
80
80
# Dependencies
81
81
Tested with `SPMF` version 2.54, and `python` version 3.8.0 with the following packages:
82
82
*[`pandas`](https://pypi.org/project/pandas/): version 1.3.5
@@ -86,5 +86,15 @@ Tested with `SPMF` version 2.54, and `python` version 3.8.0 with the following p
86
86
*[`matplotlib`](https://pypi.org/project/numpy/): version 3.6.0
87
87
88
88
89
+
90
+
The VF2 and ISMAGS algortihms are included in the [`Networkx` library](https://networkx.org/)
91
+
92
+
For the baselines:
93
+
* The WL and WLOA algorithms are included in the Grakel library, available [here](https://ysig.github.io/GraKeL/0.1a8/benchmarks.html)
94
+
* Graph2Vec is included in the karateclub library, available [here](https://karateclub.readthedocs.io/en/latest/)
95
+
* DGCNN is included in the stellargraph library, available [here](https://stellargraph.readthedocs.io/en/stable/).
96
+
* We use the implementation of CORK from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive.
97
+
98
+
89
99
# References
90
100
***[P'22]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron, P.-H. Morand. *FOPPA: A database of French Open Public Procurement Award notices*, Technical Report, Avignon University, 2022. [⟨hal-03796734⟩](https://hal.archives-ouvertes.fr/hal-03796734)
0 commit comments