Skip to content

Commit b9e8594

Browse files
authored
Update README.md
1 parent 2671fbb commit b9e8594

File tree

1 file changed

+0
-109
lines changed

1 file changed

+0
-109
lines changed

README.md

Lines changed: 0 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -5,112 +5,3 @@ Pang
55
Pang is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see licence.txt
66

77
-----------------------------------------------------------------------
8-
9-
# Description
10-
Pang is an algorithm which represents and classifies a collection of graphs according to their frequent patterns (subgraphs).
11-
12-
13-
# Organization
14-
This repository is composed of the following elements:
15-
* `requirements.txt` : List of Python packages used in pang.py.
16-
* `PANG.py` : Python script in order to use the algorithm.
17-
* `EMCL.py` : Python script in order to compute the results of the experiments of the ECML paper.
18-
* `ProcessingPattern.py` : Python script in order to compute the number of occurences and the set of induced patterns
19-
* `data` : folder with the input data files. There is one folder for each dataset, which are described in the [Datasets](#datasets) section.
20-
21-
22-
# Installation
23-
You first need to install `python` and the required packages:
24-
25-
1. Install the [`python` language](https://www.python.org)
26-
2. Download this project from GitHub and unzip.
27-
3. Execute `pip install -r requirements.txt` to install the required packages (see also the *Dependencies* Section).
28-
29-
The source code of SPMF in order to use gSpan and cgSpan is available [here](https://www.philippe-fournier-viger.com/spmf/index.php?link=download.php).
30-
SPMF is available in two versions:
31-
* a jar file that can be run from the command line. Actually, this version can be use with gSpan, but not with cgSpan.
32-
* a source code. The installation of this version is more complicated, but it allows to use cgSpan. You can find the instructions [here](https://www.philippe-fournier-viger.com/spmf/how_to_install.php).
33-
34-
In order to use Pang, you need to unzip each dataset in its own folder in the `data` folder.
35-
36-
# Use
37-
We provide two scripts to use Pang:
38-
* `ECML.py` : a python script in order to compute the results of the ECML paper.
39-
* `PANG.py` : a python script in order to use Pang with your own data.
40-
41-
## To Replicate the Paper Experiments
42-
In order to use Pang:
43-
1. Open the Python console.
44-
2. Run `EMCL.py`
45-
46-
The script will compute the results of the experiments and save the results associated with Table 2, 5 and 6 in the `results` folder.
47-
48-
49-
## To Apply PANG to Other Data
50-
If you want to use Pang with your own data, you need to create an `XXX` folder in the `data` folder and put your data in it. This folder must contain the following files:
51-
* `XXX_graph.txt` : a file containing the graphs.
52-
* `XXX_label.txt` : a file containing the labels of the graphs.
53-
54-
Then you need to run a script to produce the data files that will be used by Pang:
55-
1. Open the Python console.
56-
2. Run the script `Patterns.sh` in order to create the files `XXX_patterns.txt`.
57-
3. Run `ProcessingPattern.py`with the option `-d XXX` in order to create the files `XXX_mono.txt` and `XXX_iso.txt`.
58-
4. Run `PANG.py` with the option `-d XXX` in order to run Pang on the data `XXX`.
59-
60-
For each value of the parameter `k`, Pang will create a file `KResults.txt` containing the results of the classification and a file `KPatterns.txt` containing the patterns.
61-
62-
## Data Format
63-
We use the same format as SPMF for the graph input files. Each graph is defined as follows:
64-
65-
1. `t # N N`: graph id
66-
2. `v M L M`: node id, L: node label
67-
3. `e P Q L P`: source node id, Q: destination node id, L: edge label
68-
69-
For the patterns output files, each pattern contains one more line than the graphs:
70-
71-
4. `x A B C A,B,C` : graphs containing the pattern
72-
73-
## Datasets
74-
The datasets used in the paper are available in the `data` folder. The following datasets are available:
75-
* `MUTAG` : MUTAG dataset, representing chemical compounds and their mutagenic properties [[D'91](#references)],
76-
* `NCI1` : NCI1 dataset, representing molecules and classified according to carcinogenicity [[W'06](#references)],
77-
* `PTC` : PTC dataset, representing molecules and classified according to carcinogenicity [[T'03](#references)],
78-
* `DD` : DD dataset, representing amino acids and their interactions [[D'03](#references)],
79-
80-
Each of these datasets can be found [here](https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php).
81-
* `FOPPA` : dataset extracted from FOPPA, a database of French public procurement notices [[P'22](#references)].
82-
# Dependencies
83-
Tested with `SPMF` version 2.54, and `python` version 3.6.13 with the following packages:
84-
* [`pandas`](https://pypi.org/project/pandas/): version 1.1.5
85-
* [`numpy`](https://pypi.org/project/numpy/): version 1.19.5
86-
* [`networkx`](https://pypi.org/project/numpy/): version 2.5.1
87-
* [`sklearn`](https://pypi.org/project/numpy/): version 0.24.2
88-
* [`matplotlib`](https://pypi.org/project/numpy/): version 3.3.4
89-
* [`grakel`](https://pypi.org/project/numpy/): version 0.1.8
90-
* [`karateclub`](https://pypi.org/project/numpy/): version 1.3.3
91-
* [`stellargraph`](https://pypi.org/project/numpy/): version 1.2.1
92-
93-
94-
The VF2 and ISMAGS algortihms are included in the [`Networkx` library](https://networkx.org/)
95-
96-
For the baselines:
97-
* The WL and WLOA algorithms are included in the Grakel library, documentation available [here](https://ysig.github.io/GraKeL/0.1a8/benchmarks.html)
98-
* Graph2Vec is included in the karateclub library, documentation available [here](https://karateclub.readthedocs.io/en/latest/)
99-
* DGCNN is included in the stellargraph library, documentation available [here](https://stellargraph.readthedocs.io/en/stable/).
100-
* We use the implementation of CORK from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive.
101-
102-
103-
# References
104-
* **[P'22]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron, P.-H. Morand. *FOPPA: A database of French Open Public Procurement Award notices*, Technical Report, Avignon University, 2022. [⟨hal-03796734⟩](https://hal.archives-ouvertes.fr/hal-03796734)
105-
* **[D'91]** A.S. Debnath, R.L. Lopez, G. Debnath, A. Shusterman, C. Hansch. *Structure-
106-
activity relationship of mutagenic aromatic and heteroaromatic nitro compounds.
107-
correlation with molecular orbital energies and hydrophobicity*, Journal of Medic-
108-
inal Chemistry 34(2), 786–797, 1991.
109-
* **[W'06]** N.Wale, G. Karypis. *Comparison of descriptor spaces for chemical compound
110-
retrieval and classification*, 6th International Conference on Data Mining, pp.
111-
678–689, 2006.
112-
* **[T'03]** H . Toivonen, A. Srinivasan, R.D. King, S. Kramer, C. Helma.*Statistical eval-
113-
uation of the predictive toxicology challenge 2000-2001*, Bioinformatics 19(10),
114-
1183–1193, 2003.
115-
* **[D'03]** P.D. Dobson, A.J. Doig. *Distinguishing enzyme structures from non-enzymes
116-
without alignments*, Journal of Molecular Biology 330(4), 771–783 ,2003.

0 commit comments

Comments
 (0)