Skip to content

Commit c9a3177

Browse files
committed
Merge branch 'develop'
2 parents 16d4bb1 + 38ab56d commit c9a3177

File tree

147 files changed

+8288
-3781
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

147 files changed

+8288
-3781
lines changed

README.md

Lines changed: 39 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -9,47 +9,61 @@ Correlation
99
- Spearman
1010

1111
Clustering
12-
- K-means
1312
- Gaussian mixture models
1413

1514
Thresholding
15+
- Power-law
1616
- Random matrix theory
1717

18-
# Installation
18+
KINC is built with [ACE](https://github.com/SystemsGenetics/ACE), a framework which provides mechanisms for large-scale heterogeneous computing and data management. As such, KINC can be run in a variety of compute configurations, including single-core / single-GPU and multi-core / multi-GPU, and KINC uses its own binary file formats to represent the data objects that it produces. Each of these binary formats can be exported to a plain-text format for use in other applications.
1919

20-
This software uses GSL, OpenCL, and [ACE](https://github.com/SystemsGenetics/ACE). For instructions on installing ACE, see the project repository. For all other dependencies, consult your package manager. For example, to install dependencies on Ubuntu:
21-
```
22-
sudo apt install libgsl2 ocl-icd-opencl-dev libopenmpi-dev
23-
```
20+
## Installation
21+
22+
Refer to the files under `docs` for installation instructions. KINC is currently supported on most flavors of Linux.
2423

25-
To build & install KINC:
24+
### Palmetto
25+
26+
To use KINC on Palmetto, you must add the following modules in lieu of installing dependencies through a package manager:
27+
```bash
28+
module add cuda-toolkit/9.2
29+
module add gcc/5.4.0
30+
module add git
31+
module add gsl/2.3
32+
module add openmpi/1.10.7
33+
module add Qt/5.9.2
2634
```
27-
cd build
28-
qmake ../src/KINC.pro
29-
make qmake_all
30-
make
31-
make qmake_all
32-
make install
35+
36+
## Usage
37+
38+
KINC provides two executables: `kinc`, the command-line version, and `qkinc`, the GUI version. The command-line version can use MPI while the GUI version can display data object files that are produced by KINC. KINC produces a gene-coexpression network in several steps:
39+
1. `import-emx`: Import expression matrix text file into binary format
40+
2. `similarity`: Compute a cluster matrix and correlation matrix from expression matrix
41+
3. `threshold`: Determine an appropriate correlation threshold for correlation matrix
42+
4. `extract`: Extract an edge list from a correlation matrix given a threshold
43+
44+
Below is an example usage of `kinc` on the Yeast dataset:
3345
```
46+
# import expression matrix into binary format
47+
kinc run import-emx --input Yeast-GEM.txt --output Yeast.emx --nan NA
3448
35-
## Using the KINC GUI or Console
49+
# compute similarity matrix (with GMM clustering)
50+
mpirun -np 8 kinc run similarity --input Yeast.emx --ccm Yeast.ccm --cmx Yeast.cmx --clusmethod gmm --corrmethod spearman --minclus 1 --maxclus 5
3651
37-
ACE provides two different libraries for GUI and console applications. The `kinc` executable is the console or command line version and the `qkinc` executable is the GUI version.
52+
# determine correlation threshold
53+
kinc run rmt --input Yeast.cmx --log Yeast.log
3854
39-
# Usage
55+
# read threshold from log file
56+
THRESHOLD=$(tail -n 1 Yeast.log)
4057
41-
To build a GCN involves several steps:
58+
# extract network file from thresholded similarity matrix
59+
kinc run extract --emx Yeast.emx --ccm Yeast.ccm --cmx Yeast.cmx --output Yeast-net.txt --mincorr $THRESHOLD
60+
```
4261

43-
1. Import expression matrix
44-
2. Compute cluster composition matrix
45-
3. Compute correlation matrix
46-
4. Compute thresholded correlation matrix
62+
A more thorough example usage is provided in `scripts/run-all.sh`.
4763

48-
# Troubleshooting
49-
## An error occurred in MPI_Init
50-
KINC requires MPI as a dependency, but on most systems you can execute the command-line KINC as a stand-alone tool without using 'mpirun'. This is because KINC checks during runtime if MPI is appropriate for execution. However, on a SLURM cluster where MPI jobs must be run using the srun command and where PMI2 is compiled into MPI, then KINC cannot be executed stand-alone. It must be executed using srun with the --mpi argument set to pmi2. For example:
64+
### Running KINC on SLURM
5165

66+
Although KINC is an MPI application, generally you can run `kinc` as a stand-alone application without `mpirun` and achieve normal serial behavior. However, on a SLURM cluster where MPI jobs must be run with the `srun` command and where PMI2 is compiled into MPI, `kinc` cannot be executed stand-alone. It must be executed using `srun` with the additional argument `--mpi=pmi2`. For example:
5267
```
5368
srun --mpi=pmi2 kinc run import_emx --input Yeast-ematrix.txt --output Yeast.emx --nan NA
5469
```
55-

build-tests/.gitignore

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/Ubuntu_16_04_Setup.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Use the following steps to setup KINC for development on Ubuntu 16.04:
77

88
Most of the dependencies are available as packages:
99
```bash
10-
sudo apt install g++ libgsl-dev libopenblas-dev libopenmpi-dev ocl-icd-opencl-dev
10+
sudo apt install build-essential libgsl-dev libopenblas-dev libopenmpi-dev ocl-icd-opencl-dev
1111
```
1212

1313
For device drivers (AMD, Intel, NVIDIA, etc), refer to the manufacturer's website.
@@ -25,7 +25,7 @@ If you install Qt locally then you must add Qt to the executable path:
2525

2626
```bash
2727
# append to ~/.bashrc
28-
export QTDIR="$HOME/Qt/5.10.1/gcc_64"
28+
export QTDIR="$HOME/Qt/5.7.1/gcc_64"
2929
export PATH="$QTDIR/bin:$PATH"
3030
```
3131

@@ -34,8 +34,8 @@ export PATH="$QTDIR/bin:$PATH"
3434
Clone the ACE and KINC repositories from Github.
3535

3636
```bash
37-
git clone git@github.com:SystemsGenetics/ACE.git
38-
git clone git@github.com:SystemsGenetics/KINC.git
37+
git clone https://github.com/SystemsGenetics/ACE.git
38+
git clone https://github.com/SystemsGenetics/KINC.git
3939
```
4040

4141
## Step 3: Build ACE and KINC
@@ -45,14 +45,17 @@ Follow the ACE instructions to build ACE. If you install ACE locally then you mu
4545
```bash
4646
# append to ~/.bashrc
4747
export INSTALL_PREFIX="$HOME/software"
48+
export PATH="$INSTALL_PREFIX/bin:$PATH"
49+
export CPLUS_INCLUDE_PATH="$INSTALL_PREFIX/include:$CPLUS_INCLUDE_PATH"
50+
export LIBRARY_PATH="$INSTALL_PREFIX/lib:$LIBRARY_PATH"
4851
export LD_LIBRARY_PATH="$INSTALL_PREFIX/lib:$LD_LIBRARY_PATH"
4952
```
5053

5154
Build & install KINC:
5255

5356
```bash
5457
cd build
55-
qmake ../src/KINC.pro
58+
qmake ../src/KINC.pro PREFIX=$INSTALL_PREFIX
5659
make qmake_all
5760
make
5861
make qmake_all
@@ -63,4 +66,4 @@ You should now be able to run KINC.
6366

6467
## (Optional) Use QtCreator
6568

66-
Select **File** > **Open File or Project** and then navigate in the file browser to the ACE directory and select the ACE.pro file. Navigate through configure setup. Repeat for KINC.
69+
Select __File__ > __Open File or Project__ and then navigate in the file browser to the ACE directory and select the ACE.pro file. Navigate through configure setup. Repeat for KINC.

scripts/extract.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import argparse
2+
import pandas as pd
3+
4+
5+
6+
if __name__ == "__main__":
7+
# parse command-line arguments
8+
parser = argparse.ArgumentParser()
9+
parser.add_argument("--emx", required=True, help="expression matrix file", dest="EMX")
10+
parser.add_argument("--cmx", required=True, help="correlation matrix file", dest="CMX")
11+
parser.add_argument("-o", "--output", required=True, help="output net file", dest="OUTPUT")
12+
parser.add_argument("--mincorr", type=float, default=0, help="minimum absolute correlation threshold", dest="MINCORR")
13+
parser.add_argument("--maxcorr", type=float, default=1, help="maximum absolute correlation threshold", dest="MAXCORR")
14+
15+
args = parser.parse_args()
16+
17+
# load data
18+
emx = pd.read_table(args.EMX)
19+
cmx = pd.read_table(args.CMX, header=None, names=[
20+
"x",
21+
"y",
22+
"Cluster",
23+
"Num_Clusters",
24+
"Cluster_Samples",
25+
"Missing_Samples",
26+
"Cluster_Outliers",
27+
"Pair_Outliers",
28+
"Too_Low",
29+
"sc",
30+
"Samples"
31+
])
32+
33+
# extract correlations within thresholds
34+
cmx = cmx[(args.MINCORR <= abs(cmx["sc"])) & (abs(cmx["sc"]) <= args.MAXCORR)]
35+
36+
# insert additional columns used in netlist format
37+
cmx.insert(len(cmx.columns), "Source", [emx.index[x] for x in cmx["x"]])
38+
cmx.insert(len(cmx.columns), "Target", [emx.index[y] for y in cmx["y"]])
39+
cmx.insert(len(cmx.columns), "Interaction", ["co" for idx in cmx.index])
40+
41+
# reorder columns to netlist format
42+
cmx = cmx[[
43+
"Source",
44+
"Target",
45+
"sc",
46+
"Interaction",
47+
"Cluster",
48+
"Num_Clusters",
49+
"Cluster_Samples",
50+
"Missing_Samples",
51+
"Cluster_Outliers",
52+
"Pair_Outliers",
53+
"Too_Low",
54+
"Samples"
55+
]]
56+
57+
# save output data
58+
cmx.to_csv(args.OUTPUT, sep="\t", index=False)

scripts/run-all-py.sh

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
#!/bin/bash
2+
3+
# parse command-line arguments
4+
if [[ $# != 1 ]]; then
5+
echo "usage: $0 <infile>"
6+
exit -1
7+
fi
8+
9+
# define analytic flags
10+
DO_SIMILARITY=1
11+
DO_THRESHOLD=1
12+
DO_EXTRACT=1
13+
14+
# define input/output files
15+
DATA="data"
16+
EMX_FILE="$1"
17+
CMX_FILE="$DATA/$(basename $EMX_FILE .txt)-cmx-py.txt"
18+
NET_FILE="$DATA/$(basename $EMX_FILE .txt)-net-py.txt"
19+
20+
# similarity
21+
if [[ $DO_SIMILARITY = 1 ]]; then
22+
CLUSMETHOD="gmm"
23+
CORRMETHOD="pearson"
24+
MINEXPR="-inf"
25+
MINCLUS=1
26+
MAXCLUS=5
27+
CRITERION="bic"
28+
PREOUT="--preout"
29+
POSTOUT="--postout"
30+
MINCORR=0
31+
MAXCORR=1
32+
33+
python scripts/similarity.py \
34+
-i $EMX_FILE \
35+
-o $CMX_FILE \
36+
--clusmethod $CLUSMETHOD \
37+
--corrmethod $CORRMETHOD \
38+
--minexpr=$MINEXPR \
39+
--minclus $MINCLUS --maxclus $MAXCLUS \
40+
--crit $CRITERION \
41+
$PREOUT $POSTOUT \
42+
--mincorr $MINCORR --maxcorr $MAXCORR
43+
fi
44+
45+
# threshold
46+
if [[ $DO_THRESHOLD = 1 ]]; then
47+
NUM_GENES=$(expr $(cat $EMX_FILE | wc -l) - 1)
48+
METHOD="rmt"
49+
TSTART=0.99
50+
TSTEP=0.001
51+
TSTOP=0.50
52+
53+
python scripts/threshold.py \
54+
-i $CMX_FILE \
55+
--genes $NUM_GENES \
56+
--method $METHOD \
57+
--tstart $TSTART \
58+
--tstep $TSTEP \
59+
--tstop $TSTOP
60+
fi
61+
62+
# extract
63+
if [[ $DO_EXTRACT = 1 ]]; then
64+
MINCORR=0
65+
MAXCORR=1
66+
67+
python scripts/extract.py \
68+
--emx $EMX_FILE \
69+
--cmx $CMX_FILE \
70+
--output $NET_FILE \
71+
--mincorr $MINCORR \
72+
--maxcorr $MAXCORR
73+
fi

scripts/run-all.sh

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
#!/bin/bash
2+
3+
# parse command-line arguments
4+
if [[ $# != 1 ]]; then
5+
echo "usage: $0 <infile>"
6+
exit -1
7+
fi
8+
9+
GPU=1
10+
11+
# define analytic flags
12+
DO_IMPORT_EMX=1
13+
DO_SIMILARITY=1
14+
DO_EXPORT_CMX=1
15+
DO_THRESHOLD=1
16+
DO_EXTRACT=1
17+
18+
# define input/output files
19+
INFILE="$1"
20+
DATA="data"
21+
EMX_FILE="$DATA/$(basename $INFILE .txt).emx"
22+
CCM_FILE="$DATA/$(basename $EMX_FILE .emx).ccm"
23+
CMX_FILE="$DATA/$(basename $EMX_FILE .emx).cmx"
24+
LOGS="logs"
25+
RMT_FILE="$LOGS/$(basename $CMX_FILE .cmx).txt"
26+
27+
# apply settings
28+
if [[ $GPU == 1 ]]; then
29+
kinc settings set opencl 0:0
30+
kinc settings set threads 4
31+
kinc settings set logging off
32+
33+
NP=1
34+
else
35+
kinc settings set opencl none
36+
kinc settings set logging off
37+
38+
NP=$(nproc)
39+
fi
40+
41+
# import emx
42+
if [[ $DO_IMPORT_EMX = 1 ]]; then
43+
kinc run import-emx \
44+
--input $INFILE \
45+
--output $EMX_FILE \
46+
--nan NA
47+
fi
48+
49+
# similarity
50+
if [[ $DO_SIMILARITY = 1 ]]; then
51+
CLUSMETHOD="gmm"
52+
CORRMETHOD="pearson"
53+
MINEXPR="-inf"
54+
MINCLUS=1
55+
MAXCLUS=5
56+
CRITERION="BIC"
57+
PREOUT="--preout"
58+
POSTOUT="--postout"
59+
MINCORR=0.5
60+
MAXCORR=1
61+
62+
mpirun -np $NP kinc run similarity \
63+
--input $EMX_FILE \
64+
--ccm $CCM_FILE \
65+
--cmx $CMX_FILE \
66+
--clusmethod $CLUSMETHOD \
67+
--corrmethod $CORRMETHOD \
68+
--minexpr $MINEXPR \
69+
--minclus $MINCLUS --maxclus $MAXCLUS \
70+
--crit $CRITERION \
71+
$PREOUT $POSTOUT \
72+
--mincorr $MINCORR --maxcorr $MAXCORR
73+
fi
74+
75+
# export cmx
76+
if [[ $DO_EXPORT_CMX = 1 ]]; then
77+
OUTFILE="$DATA/$(basename $CMX_FILE .cmx)-cmx.txt"
78+
79+
kinc run export-cmx \
80+
--emx $EMX_FILE \
81+
--ccm $CCM_FILE \
82+
--cmx $CMX_FILE \
83+
--output $OUTFILE
84+
fi
85+
86+
# threshold
87+
if [[ $DO_THRESHOLD = 1 ]]; then
88+
mkdir -p $LOGS
89+
90+
kinc run rmt \
91+
--input $CMX_FILE \
92+
--log $RMT_FILE
93+
fi
94+
95+
# extract
96+
if [[ $DO_EXTRACT = 1 ]]; then
97+
NET_FILE="$DATA/$(basename $EMX_FILE .emx)-net.txt"
98+
MINCORR=0
99+
MAXCORR=1
100+
101+
kinc run extract \
102+
--emx $EMX_FILE \
103+
--ccm $CCM_FILE \
104+
--cmx $CMX_FILE \
105+
--output $NET_FILE \
106+
--mincorr $MINCORR \
107+
--maxcorr $MAXCORR
108+
fi

0 commit comments

Comments
 (0)