Skip to content

Commit f9e3d31

Browse files
committed
Merge branch 'joss/review1' into develop
2 parents 44379ee + 982be89 commit f9e3d31

File tree

10 files changed

+163
-128
lines changed

10 files changed

+163
-128
lines changed

publication/jats/paper.jats

Lines changed: 113 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -115,37 +115,45 @@ a Creative Commons Attribution 4.0 International License (CC BY
115115
</sec>
116116
<sec id="statement-of-need">
117117
<title>Statement of need</title>
118-
<p>Network analysis plays a central role in fields such as social
119-
sciences, biology, and fraud detection, where understanding
120-
relationships between entities is critical. Probabilistic generative
121-
models
118+
<p>Network analysis is central to social sciences, biology, and fraud
119+
detection, where understanding relationships is essential.
120+
Probabilistic generative models
122121
(<xref alt="Contisciani et al., 2020" rid="ref-contisciani2020community" ref-type="bibr">Contisciani
123122
et al., 2020</xref>,
124123
<xref alt="2022" rid="ref-contisciani2022community" ref-type="bibr">2022</xref>;
125124
<xref alt="Safdari et al., 2021" rid="ref-safdari2021generative" ref-type="bibr">Safdari
126125
et al., 2021</xref>,
127126
<xref alt="2022" rid="ref-safdari2022reciprocity" ref-type="bibr">2022</xref>;
128127
<xref alt="Safdari &amp; De Bacco, 2022" rid="ref-safdari2022anomaly" ref-type="bibr">Safdari
129-
&amp; De Bacco, 2022</xref>) have emerged as powerful tools for
130-
discovering hidden patterns in networks, detecting communities,
131-
identifying anomalies, and generating realistic synthetic data.
132-
However, their use is hindered by fragmented implementations, making
133-
comparison and reproduction difficult. ProbINet addresses this
134-
critical gap by consolidating recent approaches into a single, unified
135-
framework, allowing users to explore advanced techniques without the
136-
overhead of navigating multiple repositories or inconsistent
137-
documentation, boosting reproducibility and usability across
138-
disciplines.</p>
128+
&amp; De Bacco, 2022</xref>) reveal hidden patterns, detect
129+
communities, identify anomalies, and generate synthetic data. Their
130+
broader use is limited by fragmented implementations that hinder
131+
comparisons and reproducibility. ProbINet addresses this gap by
132+
unifying recent approaches in a single framework, improving
133+
accessibility and usability across disciplines.</p>
134+
<p>ProbINet stands out among network analysis tools. Graph-tool
135+
(<xref alt="Peixoto, 2014" rid="ref-peixoto_graph-tool_2014" ref-type="bibr">Peixoto,
136+
2014</xref>) provides community detection and general graph analysis
137+
tools, but it uses a different model family than our mixed-membership
138+
framework and does not account for reciprocity. CDlib
139+
(<xref alt="Rossetti et al., 2019" rid="ref-rossetti_cdlib_2019" ref-type="bibr">Rossetti
140+
et al., 2019</xref>) offers detection algorithms and evaluation
141+
routines, but ProbINet extends this with probabilistic MLE models,
142+
optional node attributes, and anomaly detection. pgmpy
143+
(<xref alt="Ankan &amp; Textor, 2024" rid="ref-ankan_pgmpy_2024" ref-type="bibr">Ankan
144+
&amp; Textor, 2024</xref>) focuses on Bayesian network structure
145+
learning, while ProbINet uncovers latent patterns like communities and
146+
reciprocity.</p>
139147
</sec>
140148
<sec id="main-features">
141149
<title>Main features</title>
142-
<p>ProbINet offers a versatile and feature-rich framework to perform
143-
inference on networks using probabilistic generative models. Key
144-
features include:</p>
150+
<p>ProbINet offers a feature-rich framework to perform inference on
151+
networks using probabilistic generative models. Key features
152+
include:</p>
145153
<list list-type="bullet">
146154
<list-item>
147-
<p><bold>Diverse Network Models</bold>: The package integrates
148-
generative models for various network types and goals:</p>
155+
<p><bold>Diverse Network Models</bold>: Integration of generative
156+
models for various network types and goals:</p>
149157
</list-item>
150158
</list>
151159
<table-wrap>
@@ -226,47 +234,38 @@ a Creative Commons Attribution 4.0 International License (CC BY
226234
</table-wrap>
227235
<list list-type="bullet">
228236
<list-item>
229-
<p><bold>Synthetic Network Generation</bold>: ProbINet enables
230-
users to generate synthetic networks that closely resemble the
231-
real ones for further analyses, such as testing hypotheses.</p>
237+
<p><bold>Synthetic Network Generation</bold>: Ability to generate
238+
synthetic networks that closely resemble real ones for further
239+
analyses (e.g., testing hypotheses).</p>
232240
</list-item>
233241
<list-item>
234-
<p><bold>Simplified Parameter Selection</bold>: ProbINet includes
235-
a cross-validation module to optimize key parameters, providing
236-
performance results in a clear dataframe.</p>
242+
<p><bold>Simplified Parameter Selection</bold>: A cross-validation
243+
module to optimize key parameters, providing performance results
244+
in a clear dataframe.</p>
237245
</list-item>
238246
<list-item>
239-
<p><bold>Rich Set of Metrics for Analysis</bold>: ProbINet
240-
includes metrics like F1 scores, Jaccard index, and advanced
241-
metrics for link and covariate prediction performance.</p>
247+
<p><bold>Rich Set of Metrics for Analysis</bold>: Advanced metrics
248+
(e.g., F1 scores, Jaccard index) for link and covariate prediction
249+
performance.</p>
242250
</list-item>
243251
<list-item>
244-
<p><bold>Powerful Visualization Tools</bold>: ProbINet includes
245-
functions to plot community memberships, and performance
246-
metrics.</p>
252+
<p><bold>Powerful Visualization Tools</bold>: Functions for
253+
plotting community memberships and performance metrics.</p>
247254
</list-item>
248255
<list-item>
249-
<p><bold>User-Friendly Command-Line Interface</bold>: ProbINet
250-
offers an intuitive command-line interface, making it accessible
251-
to users with minimal Python experience.</p>
256+
<p><bold>User-Friendly Command-Line Interface</bold>: An intuitive
257+
interface for easy access.</p>
252258
</list-item>
253259
<list-item>
254-
<p><bold>Extensible Codebase</bold>: The package is modular,
255-
allowing easy integration of new models that follow similar
256-
principles.</p>
260+
<p><bold>Extensible and Modular Codebase</bold>: Future
261+
integration of additional models possible.</p>
257262
</list-item>
258263
</list>
259264
<p>The <bold>Usage</bold> section below illustrates these features
260-
with a practical example on real-world data.</p>
265+
with a real-world example.</p>
261266
</sec>
262267
<sec id="usage">
263268
<title>Usage</title>
264-
<sec id="installation">
265-
<title>Installation</title>
266-
<p>You can install the package using <monospace>pip</monospace> or
267-
from the source repository. Detailed instructions are in the
268-
<ext-link ext-link-type="uri" xlink:href="https://mpi-is.github.io/probinet/">documentation</ext-link>.</p>
269-
</sec>
270269
<sec id="example-analyzing-a-social-network-with-probinet">
271270
<title>Example: Analyzing a Social Network with ProbINet</title>
272271
<p>This section shows how to use ProbINet to analyze a social
@@ -279,16 +278,16 @@ a Creative Commons Attribution 4.0 International License (CC BY
279278
relationships.</p>
280279
<sec id="steps-to-analyze-the-network-with-probinet">
281280
<title>Steps to Analyze the Network with ProbINet</title>
282-
<p>With ProbINet, you can load network data as an edge list,
281+
<p>With ProbINet, you can load network data as an edge list and
283282
select an algorithm (e.g., JointCRep), fit the model to extract
284283
latent variables, and analyze results like soft community
285284
memberships, which show how nodes interact across communities.
286285
This is exemplified in Figure 1. On the left, a network
287286
representation of the input data is displayed alongside the lines
288-
of code required for its analysis using ProbINet. The resulting
289-
output is shown on the right, where nodes are colored according to
290-
their inferred soft community memberships, while edge thickness
291-
and color intensity represent the inferred probability of edge
287+
of code required for its analysis using ProbINet. The result is
288+
shown on the right, where nodes are colored according to their
289+
inferred soft community memberships, while edge thickness and
290+
color intensity represent the inferred probability of edge
292291
existence.</p>
293292
<fig>
294293
<caption><p>Usage of ProbINet on a social network. (Top-left) A
@@ -305,13 +304,11 @@ a Creative Commons Attribution 4.0 International License (CC BY
305304
</sec>
306305
<sec id="running-times-of-algorithms">
307306
<title>Running Times of Algorithms</title>
308-
<p>The table below summarizes the running times for ProbINet
309-
algorithms when the package is run using the CLI
310-
<monospace>run_probinet</monospace>. <bold>N</bold> and <bold>E</bold>
311-
represent the number of nodes and edges, respectively. Edge ranges
312-
indicate variation across layers or time steps. <bold>L/T</bold>
313-
indicates the number of layers or time steps, and <bold>K</bold>
314-
represents the number of communities.</p>
307+
<p>The table below summarizes algorithm runtimes on the tutorial data.
308+
<bold>N</bold> and <bold>E</bold> represent the number of nodes and
309+
edges, respectively. Edge ranges indicate variation across layers or
310+
time steps. <bold>L/T</bold> indicates the number of layers or time
311+
steps, and <bold>K</bold> represents the number of communities.</p>
315312
<table-wrap>
316313
<table>
317314
<colgroup>
@@ -377,11 +374,10 @@ a Creative Commons Attribution 4.0 International License (CC BY
377374
</table>
378375
</table-wrap>
379376
<p>These benchmarks were performed on a 12th Gen Intel Core i9-12900
380-
CPU with 16 cores and 24 threads, using
381-
<monospace>hyperfine</monospace> and 10 runs. Runs required small
382-
amount of RAM (less than 1GB). This table provides a general overview
383-
of running times for the algorithms on the default networks. A
384-
detailed analysis should be performed on the user’s specific data.</p>
377+
CPU, using <monospace>hyperfine</monospace>
378+
(<xref alt="Peter, 2023" rid="ref-Peter_hyperfine_2023" ref-type="bibr">Peter,
379+
2023</xref>) and 10 runs. Runs required small amounts of RAM (less
380+
than 1 GB).</p>
385381
</sec>
386382
<sec id="acknowledgements">
387383
<title>Acknowledgements</title>
@@ -491,6 +487,62 @@ a Creative Commons Attribution 4.0 International License (CC BY
491487
<year iso-8601-date="1964">1964</year>
492488
</element-citation>
493489
</ref>
490+
<ref id="ref-Peter_hyperfine_2023">
491+
<element-citation publication-type="software">
492+
<person-group person-group-type="author">
493+
<name><surname>Peter</surname><given-names>David</given-names></name>
494+
</person-group>
495+
<article-title>hyperfine</article-title>
496+
<year iso-8601-date="2023-03">2023</year><month>03</month>
497+
<uri>https://github.com/sharkdp/hyperfine</uri>
498+
</element-citation>
499+
</ref>
500+
<ref id="ref-peixoto_graph-tool_2014">
501+
<element-citation publication-type="article-journal">
502+
<person-group person-group-type="author">
503+
<name><surname>Peixoto</surname><given-names>Tiago P.</given-names></name>
504+
</person-group>
505+
<article-title>The graph-tool python library</article-title>
506+
<source>figshare</source>
507+
<year iso-8601-date="2014">2014</year>
508+
<uri>http://figshare.com/articles/graph_tool/1164194</uri>
509+
<pub-id pub-id-type="doi">10.6084/m9.figshare.1164194</pub-id>
510+
</element-citation>
511+
</ref>
512+
<ref id="ref-rossetti_cdlib_2019">
513+
<element-citation publication-type="article-journal">
514+
<person-group person-group-type="author">
515+
<name><surname>Rossetti</surname><given-names>Giulio</given-names></name>
516+
<name><surname>Milli</surname><given-names>Letizia</given-names></name>
517+
<name><surname>Cazabet</surname><given-names>Rémy</given-names></name>
518+
</person-group>
519+
<article-title>CDlib: A python library to extract, compare and evaluate communities from complex networks</article-title>
520+
<source>Applied Network Science</source>
521+
<year iso-8601-date="2019">2019</year>
522+
<volume>4</volume>
523+
<issue>1</issue>
524+
<uri>https://doi.org/10.1007/s41109-019-0165-9</uri>
525+
<pub-id pub-id-type="doi">10.1007/s41109-019-0165-9</pub-id>
526+
<fpage>52</fpage>
527+
<lpage></lpage>
528+
</element-citation>
529+
</ref>
530+
<ref id="ref-ankan_pgmpy_2024">
531+
<element-citation publication-type="article-journal">
532+
<person-group person-group-type="author">
533+
<name><surname>Ankan</surname><given-names>Ankur</given-names></name>
534+
<name><surname>Textor</surname><given-names>Johannes</given-names></name>
535+
</person-group>
536+
<article-title>Pgmpy: A python toolkit for bayesian networks</article-title>
537+
<source>Journal of Machine Learning Research</source>
538+
<year iso-8601-date="2024">2024</year>
539+
<volume>25</volume>
540+
<issue>265</issue>
541+
<uri>http://jmlr.org/papers/v25/23-0487.html</uri>
542+
<fpage>1</fpage>
543+
<lpage>8</lpage>
544+
</element-citation>
545+
</ref>
494546
</ref-list>
495547
</back>
496548
</article>

publication/paper.bib

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,36 @@ @software{Peter_hyperfine_2023
7474
version = {1.16.1},
7575
year = {2023}
7676
}
77+
78+
@article{peixoto_graph-tool_2014,
79+
title = {The graph-tool python library},
80+
author = {Peixoto, Tiago P.},
81+
journal = {figshare},
82+
year = {2014},
83+
doi = {10.6084/m9.figshare.1164194},
84+
url = {http://figshare.com/articles/graph_tool/1164194},
85+
keywords = {graph, network, tools}
86+
}
87+
88+
@article{rossetti_cdlib_2019,
89+
title = {CDlib: a Python Library to Extract, Compare and Evaluate Communities from Complex Networks},
90+
author = {Rossetti, Giulio and Milli, Letizia and Cazabet, Rémy},
91+
journal = {Applied Network Science},
92+
year = {2019},
93+
volume = {4},
94+
number = {1},
95+
pages = {52},
96+
doi = {10.1007/s41109-019-0165-9},
97+
url = {https://doi.org/10.1007/s41109-019-0165-9}
98+
}
99+
100+
@article{ankan_pgmpy_2024,
101+
author = {Ankur Ankan and Johannes Textor},
102+
title = {pgmpy: A Python Toolkit for Bayesian Networks},
103+
journal = {Journal of Machine Learning Research},
104+
year = {2024},
105+
volume = {25},
106+
number = {265},
107+
pages = {1--8},
108+
url = {http://jmlr.org/papers/v25/23-0487.html}
109+
}

publication/paper.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -41,20 +41,16 @@ to analyze and model complex network data. The package integrates code implement
4141

4242
# Statement of need
4343

44-
Network analysis plays a central role in fields such as social sciences, biology, and fraud
45-
detection, where understanding relationships between entities is critical. Probabilistic
46-
generative models [@contisciani2020community; @safdari2021generative; @contisciani2022community;
47-
@safdari2022anomaly; @safdari2022reciprocity] have emerged as powerful tools for discovering
48-
hidden patterns in networks, detecting communities, identifying anomalies, and generating
49-
realistic synthetic data. However, their use is hindered by fragmented implementations, making
50-
comparisons difficult. ProbINet addresses this critical gap by consolidating
51-
recent approaches into a single, unified framework, allowing users to explore advanced techniques
52-
without the overhead of navigating multiple repositories or inconsistent documentation,
53-
boosting reproducibility and usability across disciplines.
44+
Network analysis is central to social sciences, biology, and fraud detection, where
45+
understanding relationships is essential. Probabilistic generative models [@contisciani2020community; @safdari2021generative; @contisciani2022community; @safdari2022anomaly; @safdari2022reciprocity] reveal hidden patterns, detect communities, identify anomalies, and generate synthetic data. Their broader use is limited by fragmented implementations that hinder comparisons and reproducibility.
46+
ProbINet addresses this gap by unifying recent approaches in a single framework, improving accessibility and usability across disciplines.
47+
48+
ProbINet stands out among network analysis tools. Graph-tool [@peixoto_graph-tool_2014] provides community detection and general graph analysis tools, but it uses a different model family than our mixed-membership framework and does not account for reciprocity. CDlib [@rossetti_cdlib_2019] offers detection algorithms and evaluation routines, but ProbINet extends this with probabilistic MLE models, optional node attributes, and anomaly detection. pgmpy [@ankan_pgmpy_2024] focuses on Bayesian network structure learning, while ProbINet uncovers latent patterns like communities and reciprocity.
5449

5550
# Main features
5651

57-
ProbINet offers a versatile and feature-rich framework to perform inference on networks using probabilistic generative models. Key features include:
52+
ProbINet offers a feature-rich framework to perform inference on networks using probabilistic
53+
generative models. Key features include:
5854

5955
- **Diverse Network Models**: Integration of generative models for various network types
6056
and goals:
@@ -83,7 +79,7 @@ ProbINet offers a versatile and feature-rich framework to perform inference on n
8379

8480
- **Extensible and Modular Codebase**: Future integration of additional models possible.
8581

86-
The **Usage** section below illustrates these features with a practical example on real-world data.
82+
The **Usage** section below illustrates these features with a real-world example.
8783

8884
# Usage
8985

@@ -94,19 +90,19 @@ directed edges representing friendships in a small Illinois high school [@konect
9490

9591
### Steps to Analyze the Network with ProbINet
9692

97-
With ProbINet, you can load network data as an edge list, select an algorithm (e.g., JointCRep),
93+
With ProbINet, you can load network data as an edge list and select an algorithm (e.g., JointCRep),
9894
fit the model to extract latent variables, and analyze results like soft community memberships,
9995
which show how nodes interact across communities. This is exemplified in Figure 1. On the left, a
100-
network representation of the input data is displayed alongside the lines of code required for its analysis using ProbINet. The resulting output is shown on the right, where nodes are colored according to their inferred soft community memberships, while edge thickness and color intensity represent the inferred probability of edge existence.
96+
network representation of the input data is displayed alongside the lines of code required for
97+
its analysis using ProbINet. The result is shown on the right, where nodes are colored according to their inferred soft community memberships, while edge thickness and color intensity represent the inferred probability of edge existence.
10198

10299
![Usage of ProbINet on a social network. (Top-left) A network representation of the input data. (Bottom-left) A snapshot of the code used. (Right) The resulting output.](figures/example.png)
103100

104101
For more tutorials and use cases, see the [package documentation](https://mpi-is.github.io/probinet/).
105102

106103
# Running Times of Algorithms
107104

108-
The table below provides a general overview of the algorithms running times
109-
on the data used in the tutorials.
105+
The table below summarizes algorithm runtimes on the tutorial data.
110106
**N** and **E** represent the number of nodes and edges, respectively.
111107
Edge ranges indicate variation across layers or time steps.
112108
**L/T** indicates the number of layers or time steps,

publication/paper.pdf

1.56 KB
Binary file not shown.

tests/test_ModelClass.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -88,19 +88,3 @@ def test_initialize_w_from_file(self):
8888
self.model_class.K = dfW.shape[1]
8989
self.model_class._initialize() # pylint: disable=protected-access
9090
self.assertTrue(np.all(0 <= self.model_class.w))
91-
92-
@unittest.skip("Deciding whether initialization 2 is useful or not.")
93-
def test_initialize_uv_from_file(self):
94-
self.model_class.initialization = 2
95-
self.model_class._initialize() # pylint: disable=protected-access # Set by hand
96-
self.assertTrue(np.all(0 <= self.model_class.u))
97-
self.assertTrue(np.all(0 <= self.model_class.v))
98-
99-
@unittest.skip("Deciding whether initialization 3 is useful or not.")
100-
def test_initialize_uvw_from_file(self):
101-
self.model_class.initialization = 3
102-
self.model_class.L, self.model_class.K = self.w_a.shape
103-
self.model_class._initialize() # in case it is: nodes=range(600) # pylint: disable=protected-access
104-
self.assertTrue(np.all(0 <= self.model_class.u))
105-
self.assertTrue(np.all(0 <= self.model_class.v))
106-
self.assertTrue(np.all(0 <= self.model_class.w))

0 commit comments

Comments
 (0)