This repository contains code to select optimized sentinel node sets which can closely approximate the state of all network nodes, as well as simulation and analysis code which produced the findings in our manuscript. See below for an example analysis.
The optNS package is a small R package which implements our optimization algorithm; the name of the package is simply short for "optimize node sets." The ./optNS/ directory contains the code, which you can build with
R CMD build optNSThe built package is also included as optNS_0.1-1.tar.gz. To install, use
R CMD INSTALL optNS_0.1-1.tar.gzPlease contact me with any questions.
Our method should run in reasonable time for small networks on a standard laptop (I used a laptop with four Intel i3-5010U CPUs at 2.10GHz). In order to support larger networks we used a high-performance computing cluster to simulate node states across ranges of control parameters (our "ground truth" simulations) and to optimize node sets. The shell scripts we used to do this are in ./shell/ and will need to be modified for your parallel computing environment.
In addition to the optNS package, we use an in-house differential equation simulation package called sdn. We use the following standard R repository packages:
What follows is a simple demonstration of our method using the functions in optNS. The code below is also included in the repository as demo.R.
As tested, we rely on the following packages (setting the number of cores, color palette, and random seed for convenience):
library(parallel)
ncores <- detectCores()-1
library(igraph)
library(optNS)
library(sdn)
palette("R4")
set.seed(1234)First, we choose a dynamics (our standard coupled double-well model) and generate a small network.
dynamics <- "doublewell"
N <- 50
g <- sample_pa(N, m = 2, directed = FALSE, start.graph = make_full_graph(3))Then, we simulate the equilibrium states of all nodes. We will treat these states as the ground truth in the remaining analysis. We used our sdn package to do the ground truth simulations.
AL <- as_adj_list(g, "all")
times <- 0:15
params <- c(
get(paste0(".", dynamics)),
list(AL = AL)
)
control <- list(times = times, ncores = ncores)
model <- get(dynamics)
system.time(
Y <- solve_in_range(
params$Ds, "D", model, rep(params$xinit.low, N), params, control, "ode", method = "adams", maxsteps = 20000
)
)Simulating the ground truth of large networks can be slow. For this network, simulating the ground truth took 15.840 seconds on my laptop.
To run our optimization algorithm, we need the number of nodes in the node set n, the network g, and the "ground truth" simulation results to use for optimization. The matrix Y is the "ground truth" simulation results for the coupled double-well dynamics on the dolphin network: the rows correspond to control parameter values (here,
y <- rowMeans(Y) # actual average
n <- floor(log(N)) # a small number in comparison to the size of the network
system.time(
soln <- select_optimized(n, g, y, Y)
)To produce one optimized nodeset we call optNS::select_optimized(), which took 2.623 seconds on my laptop. Note that larger networks require far more time and/or computing resources. We run the simulated annealing algorithm for ./optNS/R/make-nodesets.R and the function select_optimized().
The output of select_optimized() is a list of vertex indices, approximation error, and degree sequence; optimized node weights are null unless you ask select_optimized() to use the quadratic optimization procedure.
str(soln)To compute the approximation we select the columns from Y that correspond to the nodes our optimization algorithm returned:
Z <- Y[, soln$vs]
z <- rowMeans(Z)We can visualize the results as follows:
par(mar = c(5, 5, 1, 1), pty = "s")
matplot(
params$Ds, Y,
type = "l", lty = 1, lwd = 0.25, col = "gray50",
xlab = "D", ylab = expression(italic(x[i])), font.lab = 3,
cex.lab = 2, cex.axis = 1.5
)
lines(params$Ds, y, lty = 1, lwd = 8, col = 1)
matlines(params$Ds, Z, lty = 1, lwd = 4, col = 2)
lines(params$Ds, z, lty = 1, lwd = 8, col = 3)To make several node sets, we need a bit more time and a standardized function optNS::make_dataset() which can produce optimized or random node sets, among other options.
ntrials <- 10
system.time(
opts <- make_dataset(
ntrials = ntrials, ns.type = "opt", ncores = ncores,
n = n, g = g, y = y, Y = Y
)
)That took 12.754 seconds on my laptop. Making random node sets takes very little time.
rands <- make_dataset(
ntrials = 10*ntrials, ns.type = "rand", ncores = ncores,
n = n, g = g, y = y, Y = Y
)optNS also includes some convenience functions to retrieve information from lists of node sets. We can use those functions to make a plot that summarizes our simulation results:
error <- list(
opt = get_error(opts),
rand = get_error(rands)
)
xlim <- range(unlist(error))
dev.new(height = 5)
par(mar = c(5, 9, 1, 1))
plot(
NULL, xlim = xlim, ylim = c(0.5, 2.5), xlab = "Approximation error", ylab = "", log = "x", yaxt = "n",
cex.lab = 2, cex.axis = 1.5
)
points(error$opt, jitter(rep(2, length(error$opt)), amount = 0.1), pch = 1, col = 2)
points(error$rand, jitter(rep(1, length(error$rand)), amount = 0.1), pch = 0, col = 1)
axis(2, at = c(2, 1), tick = FALSE, labels = c("Optimized", "Random"), las = 2, cex.axis = 2)We used functional magnetic resonance imaging (fMRI) data from the Human Connectome Project to investigate the performance of our method in an empirical setting. We used the HCP Young Adult data set, described here. Registration is required to access this data, so we have not included it in our project repositories. We are including the files we used to analyze the data. The file collect-all-fmri.sh and its associated .sh and ../sims/*.R files forms networks, generates simulations, and selects node sets for each participant. Analysis and plotting can be found in ./analysis/analyze-fmri.R and ./analysis/plot-fmri.R, respectively. The data (.rds) files are available on Zenodo.