You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-9Lines changed: 8 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ This repository is composed of the following elements:
18
18
*`EMCL.py`: script that reproduces the experiments of our paper submitted to ECML PKDD.
19
19
*`PANG.py`: script that implements the Pang method.
20
20
*`ProcessingPattern.py`: script that computes the number of occurences and the set of induced patterns.
21
-
*`Pattern.sh`: **TODO (identifies the patterns with SPMF and counts them with `ProcessingPattern.py` ?).**
21
+
*`Pattern.sh`: script that computes the patterns of a dataset.
22
22
*`CORKcpp.zip`: archive containing the CORK source code (used in `EMCL.py`) cf. Section [Installation](#installation).
23
23
*`data`: folder containing the input data. Each subfolder corresponds to a distinct dataset, cf. Section [Datasets](#datasets).
24
24
*`results`: files produced by the processing.
@@ -41,12 +41,11 @@ Second, one of the dependencies, SPMF, is not a Python package, but rather a Jav
41
41
42
42
Note that SPMF is available both as a JAR and as source code archive. However, the former does not contain all the features required by Pang, so one should use only the latter.
43
43
44
-
**TODO In order to run the script that reproduces our ECML PKDD experiments, you also need to install CORK.**
44
+
In order to run the script that reproduces our ECML PKDD experiments, you also need to install CORK. This is done by unzipping the archive `CORKcpp.zip` in the `src` folder.
45
45
46
46
## Data
47
47
Third, you need to set up the data to which you want to apply Pang. This can be the dataset from our paper, in which you will need to unzip several archives, or your own data, in which case they need to be respect the appropriate format. In both cases, see cf. Section [Use](#use).
48
48
49
-
50
49
# Use
51
50
We provide two scripts to use Pang:
52
51
@@ -100,9 +99,7 @@ For information, the files produced by our scripts to list the identified patter
100
99
101
100
4.`x A B C A,B,C` : graphs containing the pattern
102
101
103
-
The format of the file containing the graph labels is as follows:
104
-
105
-
**TODO**
102
+
The format of the file containing the graph labels is as follows: each line contains an unique integer, corresponding to the label of the graph in the same line in the graph file.
106
103
107
104
### Processing
108
105
@@ -111,9 +108,11 @@ Once the data are ready, you need to run a script to identify the patterns, and
111
108
1. Open the `Python` console.
112
109
2. Run the script `Patterns.sh` in order to create the files `XXX_patterns.txt`.
113
110
3. Run `ProcessingPattern.py`with the option `-d XXX` in order to create the files `XXX_mono.txt` and `XXX_iso.txt`.
114
-
4. Run `PANG.py` with the option `-d XXX` in order to run Pang on the data `XXX`.
111
+
4. Run `PANG.py`. 2 parameters are required:
112
+
*`-d XXX` : the name of the dataset
113
+
*`-k k` : the number of patterns to consider. It can be a single value, or a list of values separated by commas.
115
114
116
-
For each value of the parameter `k`**TODO c'est quoi ce k ?**, Pang will create a file `KResults.txt` containing the results of the classification and a file `KPatterns.txt` containing the patterns.
115
+
For each value of the parameter `k`, Pang will create a file `KResults.txt` containing the results of the classification and a file `KPatterns.txt` containing the patterns.
117
116
118
117
119
118
# Dependencies
@@ -136,7 +135,7 @@ For the ECML PKDD assessment, we use the following algorithms for the sake of co
136
135
* The `WL` and `WLOA` algorithms are included in the `Grakel` library, documentation available [here](https://ysig.github.io/GraKeL/0.1a8/benchmarks.html)
137
136
*`Graph2Vec` is included in the `karateclub` library, documentation available [here](https://karateclub.readthedocs.io/en/latest/)
138
137
*`DGCNN` is included in the `stellargraph` library, documentation available [here](https://stellargraph.readthedocs.io/en/stable/).
139
-
* We use the implementation of `CORK` from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive.
138
+
* We use the implementation of `CORK` from Marisa Thoma. This implementation is available in the `CORKcpp.zip` archive, from [here](http://www.dbs.ifi.lmu.de/~thoma/pub/sam2010/sam2010.zip)
0 commit comments