Skip to content

Commit c9997d3

Browse files
committed
UpdateReadME
1 parent 3f9c354 commit c9997d3

File tree

1 file changed

+30
-20
lines changed

1 file changed

+30
-20
lines changed

README.md

Lines changed: 30 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,14 @@ Pang is free software: you can redistribute it and/or modify it under the terms
99
# Description
1010
Pang is an algorithm which represents and classifies a collection of graphs according to their frequent patterns (subgraphs).
1111

12-
The data available are in the FOPPA repository, they are extracted from FOPPA, a database of French public procurement notices [[P'22](#references)].
13-
1412

1513
# Organization
1614
This repository is composed of the following elements:
1715
* `requirements.txt` : List of Python packages used in pang.py.
1816
* `PANG.py` : Python script in order to use the algorithm.
17+
* `EMCL.py` : Python script in order to compute the results of the experiments of the ECML paper.
1918
* `ProcessingPattern.py` : Python script in order to compute the number of occurences and the set of induced patterns
20-
* `data` : folder with the input files needed.
19+
* `data` : folder with the input data files. There is one folder for each dataset, which are described in the [Datasets](#datasets) section.
2120

2221

2322
# Installation
@@ -27,19 +26,17 @@ You first need to install `python` and the required packages:
2726
2. Download this project from GitHub and unzip.
2827
3. Execute `pip install -r requirements.txt` to install the required packages (see also the *Dependencies* Section).
2928

30-
The source code of SPMF in order to use gSpan and cgSpan is available [here](https://www.philippe-fournier-viger.com/spmf/index.php?link=download.php)
31-
32-
The VF2 and ISMAGS algortihms are included in the [`Networkx` library](https://networkx.org/)
33-
34-
For the baselines:
35-
* The WL and WLOA algorithms are included in the Grakel library, available [here](https://ysig.github.io/GraKeL/0.1a8/benchmarks.html)
36-
* Graph2Vec is included in the karateclub library, available [here](https://karateclub.readthedocs.io/en/latest/)
37-
* DGCNN is included in the stellargraph library, available [here](https://stellargraph.readthedocs.io/en/stable/).
38-
* We use the implementation of CORK from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive.
29+
The source code of SPMF in order to use gSpan and cgSpan is available [here](https://www.philippe-fournier-viger.com/spmf/index.php?link=download.php).
30+
SPMF is available in two versions:
31+
* a jar file that can be run from the command line. Actually, this version can be use with gSpan, but not with cgSpan.
32+
* a source code. The installation of this version is more complicated, but it allows to use cgSpan. You can find the instructions [here](https://www.philippe-fournier-viger.com/spmf/how_to_install.php).
3933

34+
In order to use Pang, you need to unzip the datasets in each folder of the `data` folder.
4035

4136
# Use
42-
37+
We provide two scripts to use Pang:
38+
* `ECML.py` : a python script in order to compute the results of the ECML paper.
39+
* `PANG.py` : a python script in order to use Pang with your own data.
4340

4441
## To Replicate the Paper Experiments
4542
In order to use Pang:
@@ -52,19 +49,16 @@ The script will compute the results of the experiments and save the results asso
5249
## To Apply PANG to Other Data
5350
If you want to use Pang with your own data, you need to create an `XXX` folder in the `data` folder and put your data in it. This folder must contain the following files:
5451
* `XXX_graph.txt` : a file containing the graphs.
55-
* `XXX_pattern.txt` : a file containing the patterns.
5652
* `XXX_label.txt` : a file containing the labels of the graphs.
5753

5854
Then you need to run a script to produce the data files that will be used by Pang:
5955
1. Open the Python console.
60-
2. Run `ProcessingPattern.py`with the option `-d XXX` in order to create the files `XXX_mono.txt` and `XXX_iso.txt`.
61-
3. Run `PANG.py` with the option `-d XXX` in order to run Pang on the data `XXX`.
56+
2. Run the script `Patterns.sh` in order to create the files `XXX_patterns.txt`.
57+
3. Run `ProcessingPattern.py`with the option `-d XXX` in order to create the files `XXX_mono.txt` and `XXX_iso.txt`.
58+
4. Run `PANG.py` with the option `-d XXX` in order to run Pang on the data `XXX`.
6259

6360
For each value of the parameter `k`, Pang will create a file `KResults.txt` containing the results of the classification and a file `KPatterns.txt` containing the patterns.
6461

65-
## To Reconstruct all patterns for a dataset
66-
If you want to reconstruct all patterns using SPMF for a dataset, you need to run the associated script in the `scripts` folder. This process involves pattern mining and post processing for each pattern and is therefore time consuming. Each file will be saved in the associated `data` folder.
67-
6862
## Data Format
6963
We use the same format as SPMF for the graph input files. Each graph is defined as follows:
7064

@@ -76,7 +70,13 @@ For the patterns output files, each pattern contains one more line than the grap
7670

7771
4. `x A B C A,B,C` : graphs containing the pattern
7872

79-
73+
## Datasets
74+
The datasets used in the paper are available in the `data` folder. The following datasets are available:
75+
* `MUTAG` : MUTAG dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
76+
* `NCI1` : NCI1 dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
77+
* `PTC` : PTC dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
78+
* `DD` : DD dataset, available [here](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets).
79+
* `FOPPA` : dataset extracted from FOPPA, a database of French public procurement notices [[P'22](#references)].
8080
# Dependencies
8181
Tested with `SPMF` version 2.54, and `python` version 3.8.0 with the following packages:
8282
* [`pandas`](https://pypi.org/project/pandas/): version 1.3.5
@@ -86,5 +86,15 @@ Tested with `SPMF` version 2.54, and `python` version 3.8.0 with the following p
8686
* [`matplotlib`](https://pypi.org/project/numpy/): version 3.6.0
8787

8888

89+
90+
The VF2 and ISMAGS algortihms are included in the [`Networkx` library](https://networkx.org/)
91+
92+
For the baselines:
93+
* The WL and WLOA algorithms are included in the Grakel library, available [here](https://ysig.github.io/GraKeL/0.1a8/benchmarks.html)
94+
* Graph2Vec is included in the karateclub library, available [here](https://karateclub.readthedocs.io/en/latest/)
95+
* DGCNN is included in the stellargraph library, available [here](https://stellargraph.readthedocs.io/en/stable/).
96+
* We use the implementation of CORK from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive.
97+
98+
8999
# References
90100
* **[P'22]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron, P.-H. Morand. *FOPPA: A database of French Open Public Procurement Award notices*, Technical Report, Avignon University, 2022. [⟨hal-03796734⟩](https://hal.archives-ouvertes.fr/hal-03796734)

0 commit comments

Comments
 (0)