Skip to content

Commit fd81547

Browse files
authored
Update on the SBM prior: calculating the number of cluster and the allocation of the nodes (#29)
* Update on the SBM prior: (1) now we save the sampled block allocation parameters and block matrices (2) we sample the number of clusters by first calculating the probability of a sampled cluster configuration using Equation 3.7 pf Miller & Harrison (2018) * update the restriction on Kmax * changes: (1) take out the option to save out_cluster_prob and (2) changes K_max = no_variables + 10 * take out saving K within the Gibbs sampler * Include the option to calculate the number of clusters and cluster allocations * update the references
1 parent d124987 commit fd81547

File tree

7 files changed

+188
-37
lines changed

7 files changed

+188
-37
lines changed

R/bgm.R

Lines changed: 51 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -111,12 +111,14 @@
111111
#' model \code{edge_prior = "Stochastic-Block"} assumes that nodes can be
112112
#' organized into blocks or clusters. In principle, the assignment of nodes to
113113
#' such clusters is unknown, and the model as implemented here considers all
114-
#' possible options \insertCite{@i.e., specifies a Dirichlet process on the node to block
115-
#' allocation as described by @GengEtAl_2019}{bgms}. This model is advantageous
114+
#' possible options \insertCite{@i.e., specifies a Dirichlet prior on the probability
115+
#' of allocations as described by @GengEtAl_2019}{bgms}. This model is advantageous
116116
#' when nodes are expected to fall into distinct clusters. The inclusion
117117
#' probabilities for the edges are defined at the level of the clusters, with a
118118
#' beta prior for the unknown inclusion probability with shape parameters
119-
#' \code{beta_bernoulli_alpha} and \code{beta_bernoulli_beta}. The default is
119+
#' \code{beta_bernoulli_alpha} and \code{beta_bernoulli_beta}, and a Dirichlet
120+
#' prior on the cluster assignment probabilities with a common concentration
121+
#' parameter \code{dirichlet_alpha}. The default is
120122
#' \code{edge_prior = "Bernoulli"}.
121123
#' @param inclusion_probability The prior edge inclusion probability for the
122124
#' Bernoulli model. Can be a single probability, or a matrix of \code{p} rows
@@ -127,7 +129,7 @@
127129
#' positive numbers. Defaults to \code{beta_bernoulli_alpha = 1} and
128130
#' \code{beta_bernoulli_beta = 1}.
129131
#' @param dirichlet_alpha The shape of the Dirichlet prior on the node-to-block
130-
#' allocation parameters for the Stochastic Block model.
132+
#' allocation probabilities for the Stochastic Block model.
131133
#' @param na.action How do you want the function to handle missing data? If
132134
#' \code{na.action = "listwise"}, listwise deletion is used. If
133135
#' \code{na.action = "impute"}, missing data are imputed iteratively during the
@@ -156,8 +158,14 @@
156158
#' ``blume-capel'' variables, the first entry is the parameter for the linear
157159
#' effect and the second entry is the parameter for the quadratic effect, which
158160
#' models the offset to the reference category.
161+
#' \item In the case of
162+
#' \code{edge_prior = "Stochastic-Block"} two additional elements are returned:
163+
#' a vector \code{allocations} with the estimated cluster assignments of the nodes and an
164+
#' table \code{clusters} with the estimated posterior probability of the number
165+
#' of clusters in the network. The vector of node allocations is calculated using
166+
#' a method proposed by \insertCite{Dahl2009}{bgms} and also used by
167+
#' \insertCite{GengEtAl_2019}{bgms}.
159168
#' }
160-
#'
161169
#' If \code{save = TRUE}, the result is a list of class ``bgms'' containing:
162170
#' \itemize{
163171
#' \item \code{indicator}: A matrix with \code{iter} rows and
@@ -169,8 +177,15 @@
169177
#' \item \code{thresholds}: A matrix with \code{iter} rows and
170178
#' \code{sum(m)} columns, containing parameter states from every iteration of
171179
#' the Gibbs sampler for the category thresholds.
180+
#' \item In the case of
181+
#' \code{edge_prior = "Stochastic-Block"} a matrix \code{allocations} with the
182+
#' cluster assignments of the nodes from each iteration is returned. This matrix
183+
#' can be used to calculate the posterior probability of the number of clusters
184+
#' by utilizing the \code{summary_SBM(bgm_output[["allocations"]])} function.
172185
#' }
173186
#' Column averages of these matrices provide the model-averaged posterior means.
187+
#' Except for the \code{allocations} matrix, for which the \code{summary_SBM}
188+
#' needs to be utilized.
174189
#'
175190
#' In addition to the analysis results, the bgm output lists some of the
176191
#' arguments of its call. This is useful for post-processing the results.
@@ -504,10 +519,24 @@ bgm = function(x,
504519
arguments$data_columnnames = data_columnnames
505520

506521
if(edge_selection == TRUE) {
507-
output = list(indicator = indicator,
508-
interactions = interactions,
509-
thresholds = thresholds,
510-
arguments = arguments)
522+
if(edge_prior == "Stochastic-Block"){
523+
524+
summarySbm <- summary_SBM(cluster_allocations = out$allocations)
525+
526+
output = list(indicator = indicator,
527+
interactions = interactions,
528+
thresholds = thresholds,
529+
allocations = summarySbm$allocations,
530+
clusters = summarySbm$no_clusters,
531+
arguments = arguments)
532+
} else {
533+
534+
output = list(indicator = indicator,
535+
interactions = interactions,
536+
thresholds = thresholds,
537+
arguments = arguments)
538+
}
539+
511540
} else {
512541
output = list(interactions = interactions,
513542
thresholds = thresholds,
@@ -557,10 +586,19 @@ bgm = function(x,
557586
arguments$data_columnnames = data_columnnames
558587

559588
if(edge_selection == TRUE) {
560-
output = list(indicator = indicator,
561-
interactions = interactions,
562-
thresholds = thresholds,
563-
arguments = arguments)
589+
if(edge_prior == "Stochastic-Block"){
590+
output = list(indicator = indicator,
591+
interactions = interactions,
592+
thresholds = thresholds,
593+
allocations = out$allocations,
594+
arguments = arguments)
595+
} else {
596+
597+
output = list(indicator = indicator,
598+
interactions = interactions,
599+
thresholds = thresholds,
600+
arguments = arguments)
601+
}
564602
} else {
565603
output = list(interactions = interactions,
566604
thresholds = thresholds,

R/utility_functions.R

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1007,3 +1007,59 @@ compare_reformat_data = function(x,
10071007
missing_index_gr2 = missing_index_gr2,
10081008
na_impute = na_impute))
10091009
}
1010+
1011+
# Dahl's method to summarize the samples of the cluster_allocations
1012+
# This function was adapted from the R code accompanying the paper:
1013+
# Geng, J., Bhattacharya, A., & Pati, D. (2019). Probabilistic Community
1014+
# Detection With Unknown Number of Communities, Journal of the American
1015+
# Statistical Association, 114:526, 893-905, DOI:10.1080/01621459.2018.1458618
1016+
1017+
getDahl <- function(cluster_allocations) {
1018+
# Dimensions of the input matrix
1019+
niters <- nrow(cluster_allocations) # Number of iterations
1020+
n <- ncol(cluster_allocations) # Number of nodes
1021+
1022+
# Compute membership matrices for each iteration
1023+
membershipMatrices <- apply(cluster_allocations, 1, function(clusterAssign) {
1024+
outer(clusterAssign, clusterAssign, FUN = "==")
1025+
})
1026+
1027+
# Reshape membershipMatrices into a list of matrices
1028+
membershipMatrices <- lapply(seq_len(niters), function(i) {
1029+
matrix(membershipMatrices[, i], n, n)
1030+
})
1031+
1032+
# Compute the average membership matrix
1033+
membershipAverage <- Reduce("+", membershipMatrices) / niters
1034+
1035+
# Compute squared error for each iteration
1036+
SqError <- sapply(membershipMatrices, function(x, av) sum((x - av)^2),
1037+
av = membershipAverage)
1038+
1039+
# Find the iteration with the minimum squared error
1040+
DahlIndex <- which.min(SqError)
1041+
1042+
# Extract the cluster assignment corresponding to the best iteration
1043+
DahlAns <- cluster_allocations[DahlIndex, , drop = TRUE]
1044+
1045+
return(DahlAns)
1046+
}
1047+
1048+
1049+
# A function that computes the cluster posterior probabilities and allocations
1050+
1051+
summary_SBM <- function(cluster_allocations) {
1052+
# Compute the number of unique clusters for each iteration
1053+
clusters <- apply(cluster_allocations, 1, function(row) length(unique(row)))
1054+
1055+
# Compute the posterior probabilities of the actual unique clusters
1056+
no_clusters <- table(clusters) / length(clusters)
1057+
1058+
# Compute the allocations of the nodes based on Dahl's method
1059+
allocations <- getDahl(cluster_allocations)
1060+
1061+
# Return the results
1062+
return(list(no_clusters = no_clusters,
1063+
allocations = allocations))
1064+
}
1065+

inst/REFERENCES.bib

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,16 @@
1+
@article{Dahl2009,
2+
author = {David B. Dahl},
3+
title = {Modal clustering in a class of product partition models},
4+
journal = {Bayesian Analysis},
5+
volume = {4},
6+
number = {2},
7+
pages = {243--264},
8+
year = {2009},
9+
month = {June},
10+
doi = {10.1214/09-BA409},
11+
url = {https://doi.org/10.1214/09-BA409}
12+
}
13+
114
@article{GengEtAl_2019,
215
title={Probabilistic community detection with unknown number of communities},
316
author={Geng, J. and Bhattacharya, A. and Pati, D.},
@@ -8,12 +21,14 @@ @article{GengEtAl_2019
821
year={2019}}
922

1023
@article{SekulovskiEtAl_2023,
11-
title={Testing conditional independence in psychometric networks: An analysis of three bayesian methods},
12-
author={Sekulovski, N. and Keetelaar, S. and Huth, K. and Wagenmakers, {E.-J.} and van Bork, R. and van den Bergh, D. and Marsman, M.},
13-
journal={Multivariate Behavioral Research},
14-
doi = {10.1080/00273171.2024.2345915},
15-
year={in press}
16-
}
24+
author = {Sekulovski, N. and Keetelaar, S. and Huth, K. B. S. and Wagenmakers, Eric-Jan and {van }Bork, R. and {van den }Bergh, D. and Marsman, M.},
25+
doi = {10.1080/00273171.2024.2345915},
26+
journal = {Multivariate Behavioral Research},
27+
pages = {913--933},
28+
title = {Testing conditional independence in psychometric networks: {A}n analysis of three {B}ayesian methods},
29+
volume = {59},
30+
year = {2024},
31+
bdsk-url-1 = {https://doi.org/10.1080/00273171.2024.2345915}}
1732

1833
@article{Dienes_2014,
1934
author = {Dienes, Z.},

man/bgm.Rd

Lines changed: 20 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/gibbs_functions.cpp

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -812,6 +812,8 @@ List gibbs_sampler(IntegerMatrix observations,
812812
int no_interactions = Index.nrow();
813813
int no_thresholds = sum(no_categories);
814814
int max_no_categories = max(no_categories);
815+
IntegerVector K_values; // To store sampled K values
816+
815817

816818
IntegerVector v = seq(0, no_interactions - 1);
817819
IntegerVector order(no_interactions);
@@ -864,7 +866,10 @@ List gibbs_sampler(IntegerMatrix observations,
864866
NumericMatrix cluster_prob(1, 1);
865867
NumericVector log_Vn(1);
866868

867-
if(edge_prior == "Stochastic-Block") {
869+
// store the allocation indices for each iteration
870+
NumericMatrix out_allocations(iter, no_variables);
871+
872+
if(edge_prior == "Stochastic-Block") { // Initial Configuration of the cluster allocations
868873
cluster_allocations[0] = 0;
869874
cluster_allocations[1] = 1;
870875
for(int i = 2; i < no_variables; i++) {
@@ -1038,11 +1043,17 @@ List gibbs_sampler(IntegerMatrix observations,
10381043
//The post burn-in iterations ------------------------------------------------
10391044
for(int iteration = 0; iteration < iter; iteration++) {
10401045
if (Progress::check_abort()) {
1041-
if(edge_selection == true) {
1046+
if(edge_selection == true && edge_prior == "Stochastic-Block") {
1047+
return List::create(Named("indicator") = out_indicator,
1048+
Named("interactions") = out_interactions,
1049+
Named("thresholds") = out_thresholds,
1050+
Named("allocations") = out_allocations);
1051+
} if(edge_selection == true && edge_prior != "Stochastic-Block") {
10421052
return List::create(Named("indicator") = out_indicator,
10431053
Named("interactions") = out_interactions,
10441054
Named("thresholds") = out_thresholds);
1045-
} else {
1055+
}
1056+
else {
10461057
return List::create(Named("interactions") = out_interactions,
10471058
Named("thresholds") = out_thresholds);
10481059
}
@@ -1150,6 +1161,7 @@ List gibbs_sampler(IntegerMatrix observations,
11501161
beta_bernoulli_alpha,
11511162
beta_bernoulli_beta);
11521163

1164+
11531165
for(int i = 0; i < no_variables - 1; i++) {
11541166
for(int j = i + 1; j < no_variables; j++) {
11551167
theta(i, j) = cluster_prob(cluster_allocations[i], cluster_allocations[j]);
@@ -1234,15 +1246,26 @@ List gibbs_sampler(IntegerMatrix observations,
12341246
out_thresholds(no_variables - 1, 1) /= iteration + 1;
12351247
}
12361248
}
1249+
1250+
for(int i = 0; i < no_variables; i++) {
1251+
out_allocations(iteration, i) = cluster_allocations[i] + 1;
1252+
}
12371253
}
12381254

12391255

12401256
if(edge_selection == true) {
1257+
if(edge_prior == "Stochastic-Block"){
12411258
return List::create(Named("indicator") = out_indicator,
12421259
Named("interactions") = out_interactions,
1243-
Named("thresholds") = out_thresholds);
1260+
Named("thresholds") = out_thresholds,
1261+
Named("allocations") = out_allocations);
1262+
} else {
1263+
return List::create(Named("indicator") = out_indicator,
1264+
Named("interactions") = out_interactions,
1265+
Named("thresholds") = out_thresholds);
1266+
}
12441267
} else {
1245-
return List::create(Named("interactions") = out_interactions,
1246-
Named("thresholds") = out_thresholds);
1268+
return List::create(Named("interactions") = out_interactions,
1269+
Named("thresholds") = out_thresholds);
12471270
}
12481271
}

src/gibbs_functions_edge_prior.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,7 @@ IntegerVector block_allocations_mfm_sbm(IntegerVector cluster_assign,
337337
}
338338
}
339339
return cluster_assign;
340+
340341
}
341342

342343
// ----------------------------------------------------------------------------|
@@ -376,4 +377,4 @@ NumericMatrix block_probs_mfm_sbm(IntegerVector cluster_assign,
376377
}
377378

378379
return block_probs;
379-
}
380+
}

vignettes/refs.bib

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1+
12
@article{SekulovskiEtAl_2023,
2-
title={Testing conditional independence in psychometric networks: An analysis of three bayesian methods},
3-
author={Sekulovski, N. and Keetelaar, S. and Huth, K. and Wagenmakers, {E.-J.} and van Bork, R. and van den Bergh, D. and Marsman, M.},
4-
journal={Multivariate Behavioral Research},
5-
doi = {10.1080/00273171.2024.2345915},
6-
year={in press}
7-
}
3+
author = {Sekulovski, N. and Keetelaar, S. and Huth, K. B. S. and Wagenmakers, Eric-Jan and {van }Bork, R. and {van den }Bergh, D. and Marsman, M.},
4+
doi = {10.1080/00273171.2024.2345915},
5+
journal = {Multivariate Behavioral Research},
6+
pages = {913--933},
7+
title = {Testing conditional independence in psychometric networks: {A}n analysis of three {B}ayesian methods},
8+
volume = {59},
9+
year = {2024},
10+
bdsk-url-1 = {https://doi.org/10.1080/00273171.2024.2345915}}
811

912
@article{GengEtAl_2019,
1013
title={Probabilistic community detection with unknown number of communities},

0 commit comments

Comments
 (0)