You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Copyright 2021-2023 Lucas Potin & Rosa Figueiredo & Vincent Labatut & Christine Largeron
6
+
4
7
Pang is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see licence.txt
Pang is an algorithm which represents and classifies a collection of graphs according to their frequent patterns (subgraphs).
15
+
Pang is an algorithm which represents and classifies a collection of graphs according to their frequent patterns (subgraphs). The detail of this algorithm are described in an article [[P'23](#references)].
16
+
This work was conducted in the framework of the DeCoMaP ANR project (Detection of corruption in public procurement markets -- ANR-19-CE38-0004).
8
17
9
18
**Content**
10
19
*[Organization](#organization)
@@ -57,7 +66,7 @@ We provide two scripts to use Pang:
57
66
*`PANG.py`: applies Pang in the general case, possibly to your own data.
58
67
59
68
## To Replicate the Paper Experiments
60
-
To replicate our ECML PKDD experiments, first unzip the provided datasets, and run Pang on them.
69
+
To replicate the experiments in our Paper[[P'23](#references)], first unzip the provided datasets, and run Pang on them.
61
70
62
71
### Data Preparation
63
72
To unzip the datasets used in our experiments:
@@ -72,7 +81,7 @@ We retrieved the benchmark datasets from the [SPMF website](https://www.philippe
72
81
*`DD` : DD dataset, representing amino acids and their interactions [[D'03](#references)]
73
82
74
83
The public procurement dataset contains graphs extracted from the FOPPA database:
75
-
*`FOPPA` : dataset extracted from FOPPA, a database of French public procurement notices [[P'22](#references)].
84
+
*`FOPPA` : dataset extracted from FOPPA, a database of French public procurement notices [[P'23b](#references)].
76
85
77
86
78
87
### Processing
@@ -143,13 +152,14 @@ For the ECML PKDD assessment, we use the following algorithms for the sake of co
143
152
144
153
145
154
# References
155
+
***[P'23]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron *Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement*, ECML PKDD 2023.
146
156
***[C'04]** L. P. Cordella, P. Foggia, C. Sansone, M. Vento. *A (sub)graph isomorphism algorithm for matching large graphs*, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10):1367-1372, 2004. DOI: [10.1109/tpami.2004.75](https://doi.org/10.1109/tpami.2004.75)
147
157
***[D'91]** A. S. Debnath, R. L. Lopez, G. Debnath, A. Shusterman, C. Hansch. *Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity*, Journal of Medicinal Chemistry 34(2):786–797, 1991. DOI: [10.1021/jm00106a046](https://doi.org/10.1021/jm00106a046)
148
158
***[D'03]** P. D. Dobson, A. J. Doig. *Distinguishing enzyme structures from non-enzymes without alignments*, Journal of Molecular Biology 330(4):771–783, 2003. DOI: [10.1016/S0022-2836(03)00628-4](https://doi.org/10.1016/S0022-2836(03)00628-4)
149
159
***[H'14']** M. Houbraken, S. Demeyer, T. Michoel, P. Audenaert, D. Colle, M. Pickavet. *The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration*, PLoS ONE 9(5):e97896, 2014. DOI: [10.1371/journal.pone.0097896](https://doi.org/10.1371/journal.pone.0097896).
150
160
***[K'16]** N. M. Kriege, P. L. Giscard, R. Wilson. *On Valid Optimal Assignment Kernels and Applications to Graph Classification*, 30th International Conference on Neural Information Processing Systems, pp. 1623–1631, 2016. URL: [here](https://proceedings.neurips.cc/paper_files/paper/2016/hash/0efe32849d230d7f53049ddc4a4b0c60-Abstract.html)
151
161
***[N'17]** A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, S. Jaiswal. *graph2vec: Learning Distributed Representations of Graphs*, 13th International Workshop on Mining and Learning with Graphs, p. 21, 2017. URL: [here](https://arxiv.org/abs/1707.05005)
152
-
***[P'22]** L. Potin, V. Labatut, R. Figueiredo, C. Largeron, P.-H. Morand. *FOPPA: A database of French Open Public Procurement Award notices*, Technical Report, Avignon University, 2022. [⟨hal-03796734⟩](https://hal.archives-ouvertes.fr/hal-03796734)
162
+
***[P'23b]** L. Potin, V. Labatut, P. H. Morand & C. Largeron. *FOPPA: An Open Database of French Public Procurement Award Notices From 2010–2020*, Scientific Data, 2023, 10:303. DOI: [10.1038/s41597-023-02213-z](https://dx.doi.org/10.1038/s41597-023-02213-z)[⟨hal-04101350⟩](https://hal.archives-ouvertes.fr/hal-04101350)
153
163
***[S'11]** N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, K. M. Borgwardt. *Weisfeiler-Lehman Graph Kernels*, Journal of Machine Learning Research 12:2539–2561, 2011. URL: [here](https://dl.acm.org/citation.cfm?id=2078187)
154
164
***[S'21]** Z. Shaul, S. Naaz. *cgSpan: Closed Graph-Based Substructure Pattern Mining, IEEE International Conference on Big Data, pp. 4989-4998, 2021. DOI: [10.1109/bigdata52589.2021.9671995](https://doi.org/10.1109/bigdata52589.2021.9671995)
155
165
***[T'03]** H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, C. Helma. *Statistical evaluation of the predictive toxicology challenge 2000-2001*, Bioinformatics 19(10):1183–1193, 2003. DOI: [10.1093/bioinformatics/btg130](https://doi.org/10.1093/bioinformatics/btg130)
0 commit comments