Skip to content

Commit e42d140

Browse files
authored
Merge pull request #24 from jagruti8/add-detectors-metrics
Added a new detector, modified README.md file and added an algorithm.txt file
2 parents 4c24f78 + 2838e54 commit e42d140

File tree

4 files changed

+209
-11
lines changed

4 files changed

+209
-11
lines changed

README.md

Lines changed: 81 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,24 @@ You should put the code in this `findoutlie` directory on your Python PATH.
88

99
This README file has instructions on how to get, validate and process the data.
1010

11+
## clone the repository
12+
13+
```
14+
git clone [email protected]:nipraxis-fall-2022/diagnostics-NME.git
15+
```
16+
17+
## open the repository
18+
19+
```
20+
cd diagnostics-NME
21+
```
22+
23+
## Install the dependencies
24+
25+
Make sure to install everything listed in 'requirements.txt' using 'pip':
26+
```
27+
pip3 install --user scipy matplotlib pandas scikit-image sympy nibabel jupyter ipython jupytext nipraxis okpy
28+
1129
## Get the data
1230
1331
```
@@ -16,7 +34,13 @@ curl -L https://figshare.com/ndownloader/files/34951650 -o group_data.tar
1634
tar xvf group_data.tar
1735
```
1836
19-
Add the hash_list file to Git:
37+
First check if the hash_list.txt is added or not
38+
```
39+
git status
40+
```
41+
42+
if there is no modification, it means the hash_list.txt is already added to git
43+
Else, add the hash_list file to Git:
2044
2145
```
2246
git add data/group-*/hash_list.txt
@@ -35,6 +59,42 @@ cd ..
3559
python3 scripts/validate_data.py data
3660
```
3761
62+
## Install the new directory module 'findoutlie'
63+
64+
To do this, first install the Flit Python package manager:
65+
Flit is a system for configuring and installing modules.
66+
You may be able to moit the --user below
67+
```
68+
python3 -m pip install --user flit
69+
```
70+
71+
Next install the module using Flit. Here the command differs on Windows compared to Linux or macOS.
72+
73+
For macOS and Linux:
74+
75+
(See below for Windows command)
76+
Use Flit to install the module.
77+
78+
```
79+
python3 -m flit install --user -s
80+
```
81+
82+
For Windows:
83+
(See above for macOS and Linux)
84+
Use Flit to install the module.
85+
86+
```
87+
python3 -m flit install --user --pth-file
88+
```
89+
90+
Now test that you can import the 'findoutlie' module by running the command. The -c flag tells Python to run the code that follows the -c flag.
91+
92+
```
93+
python3 -c 'import findoutlie'
94+
```
95+
96+
This should give no error, because the previous step installed the 'findoutlie' directory module to somewhere on Python's search path.
97+
3898
## Find outliers
3999
40100
```
@@ -54,9 +114,24 @@ identified as an outlier. 0 refers to the first volume. For example (these
54114
outlier IDs are completely random, for illustration):
55115
56116
```
57-
data/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 3, 21, 22, 104
58-
data/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 11, 33, 91
59-
data/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 101, 102, 132
60-
data/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 166, 167
61-
data/sub-09/func/sub-08_task-taskzero_run-01_bold.nii.gz, 3
117+
data/group-01/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19, 133, 134, 135, 136, 154, 155, 157
118+
data/group-01/sub-08/func/sub-08_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 9, 17, 53, 54, 63, 78, 79, 151, 152, 153
119+
data/group-01/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 157, 158
120+
data/group-01/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 17, 19, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160
121+
data/group-01/sub-06/func/sub-06_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 153, 154, 155, 156, 157, 158, 159, 160, 161
122+
data/group-01/sub-06/func/sub-06_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 19, 24, 25, 26, 27, 28, 29, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159
123+
data/group-01/sub-07/func/sub-07_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28
124+
data/group-01/sub-07/func/sub-07_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 132, 136, 137, 138, 139, 140, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161
125+
data/group-01/sub-09/func/sub-09_task-taskzero_run-01_bold.nii.gz, 0, 134, 135, 136, 143, 144, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160
126+
data/group-01/sub-09/func/sub-09_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 79, 80, 150, 151, 152, 153, 154, 155, 156, 157
127+
data/group-01/sub-10/func/sub-10_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 26, 104
128+
data/group-01/sub-10/func/sub-10_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159
129+
data/group-01/sub-05/func/sub-05_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 26, 27, 48, 49, 52, 76, 77, 150
130+
data/group-01/sub-05/func/sub-05_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 50, 51, 52, 54, 157, 158, 159, 160, 161
131+
data/group-01/sub-02/func/sub-02_task-taskzero_run-02_bold.nii.gz, 34, 65, 105, 106, 107, 135, 140, 148
132+
data/group-01/sub-02/func/sub-02_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21
133+
data/group-01/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 101, 102, 103, 160, 161
134+
data/group-01/sub-03/func/sub-03_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 136, 137, 138, 139, 140, 142, 156, 157, 158, 159
135+
data/group-01/sub-04/func/sub-04_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 59, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161
136+
data/group-01/sub-04/func/sub-04_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 49, 50, 52, 53, 54, 55, 57, 58, 148, 149, 157
62137
```

algorithm.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Algorithm:
2+
3+
1. The 4D image is first segmented (using otsu threshold) to segment the brain voxels from the background. (x)
4+
2. The median(med_voxel(x)) and median absolute deviation(mad_voxel(x)) is calculated for each of the brain voxels.
5+
3. The brain voxels lying outside the interval [med_voxel(x)-a*mad_voxel(x), med_voxel(x)+a*mad_voxel(x)] are considered as outliers. a = 3.5
6+
4. For each time t, the number of outlying voxels n(t) is counted.
7+
5. The median (n_med) and MAD (n_mad) of n(t) are calculated. Any time t with n(t)>n_med+3.5*n_mad are considered as outliers.
8+
9+
References:
10+
1. Cox, R.W. Outlier Detection in FMRl Time Series. ISMRM(2002).

findoutlie/detectors.py

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,65 @@
1414
# +++your code here+++
1515
import numpy as np
1616

17+
from scipy.stats import norm
18+
19+
def mad_voxel_detector(img,threshold=3.5):
20+
""" Detect outliers in 'img' using mediaan absolute deviation.
21+
Returns 2D vector of same shape as 'img', where True means the corresponding
22+
value in 'img' is an outlier.
23+
24+
Call med as median per voxel and mad as median absolute deviation per voxel of the 'img'
25+
26+
Parameters
27+
----------
28+
img : 2D array
29+
Values for which we will detect outliers
30+
p : float, optional
31+
Scalar to multiply the median absolute deviation
32+
to form the upper and lower threshold. Default is 3.5.
33+
34+
Returns
35+
-------
36+
outlier_tf : 2D boolean array
37+
2D boolean array of same shape as 'img', where True means the corresponsding value in 'img'
38+
is an outlier.
39+
"""
40+
# Calculate median per voxel
41+
med = np.expand_dims(np.nanmedian(img,axis=-1),axis=1)
42+
# Calculate mean absolute deviation per voxel
43+
mad = np.expand_dims(np.nanmedian(np.abs(img-med),axis=-1),axis=1)
44+
# calculate the outliers
45+
outlier_tf = np.abs(img-med)>(threshold*mad)
46+
return outlier_tf
47+
48+
def mad_time_detector(measures, threshold=3.5):
49+
""" Detect outliers in 'measures' using median absolute deviation.
50+
Returns 1D vector of same length as 'measures', where True means the corresponsding
51+
value in 'measures' is an outlier.
52+
53+
Call med as median and mad as median absolute deviation of the 'measures'
54+
55+
Parameters
56+
----------
57+
measures : 1D array
58+
Values for which we will detect outliers
59+
threshold : float, optional
60+
Scalar to multiply the median aboslute deviation to form the upper threshold.
61+
Default is 3.5.
62+
63+
Returns
64+
-------
65+
outlier_tf : 1D boolean array
66+
1D boolean array of same length as 'measures', where True means the
67+
corresponding value in 'measures' is an outlier.
68+
"""
69+
# Calculate median of measures
70+
med = np.median(measures)
71+
# Calculate median absoulte deviation of measures
72+
mad = np.median(np.abs(measures-med))
73+
# Calculate the outliers
74+
outlier_tf = measures>med+threshold*mad
75+
return outlier_tf
1776

1877
def iqr_detector(measures, iqr_proportion=1.5):
1978
""" Detect outliers in `measures` using interquartile range.
@@ -59,6 +118,6 @@ def iqr_detector(measures, iqr_proportion=1.5):
59118
# Calculate the interquartile range
60119
IQR = Q3 - Q1
61120
# Calculate the outliers
62-
outliers = np.logical_or(measures > (Q3 + IQR * iqr_proportion), measures < (Q1 - IQR * iqr_proportion))
63-
return outliers
121+
outlier_tf = np.logical_or(measures > (Q3 + IQR * iqr_proportion), measures < (Q1 - IQR * iqr_proportion))
122+
return outlier_tf
64123

findoutlie/outfind.py

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,61 @@
77

88
import nibabel as nib
99

10+
from skimage.filters import threshold_otsu
11+
1012
from .metrics import dvars
11-
from .detectors import iqr_detector
13+
from .detectors import iqr_detector,mad_voxel_detector,mad_time_detector
14+
15+
def segment_brain(img):
16+
""" Segments brain region from background and returns only brain voxels
17+
Parameters
18+
----------
19+
img : array
20+
2D array with voxels in rows and timepoints in columns
21+
Returns
22+
-------
23+
thresholded_img : array
24+
2D array containing only brain voxels in rows and timepoints in columns
25+
"""
26+
# calculate the mean of each voxel over time
27+
mean_img = np.mean(img, axis=-1)
28+
# calculate the threshold for segmenting brain from background
29+
threshold = threshold_otsu(mean_img)
30+
mask = np.expand_dims(mean_img > threshold, axis=1)
31+
mask_2D = np.tile(mask, (1, img.shape[-1]))
32+
thresholded_img = np.where(mask_2D, img, np.nan)
33+
# filter only brain voxels
34+
brain_voxels = thresholded_img[~np.isnan(thresholded_img).all(axis=1)]
35+
return brain_voxels
36+
37+
def detect_outliers_mean_absolute_deviation_mask(fname):
38+
""" Detect outliers given image file path 'filename'
39+
40+
Parameters
41+
----------
42+
fname : str or Path
43+
Filename of 4D image, as string or Path object
44+
45+
Returns
46+
-------
47+
outliers : array
48+
Indices of outlier volumes.
49+
"""
50+
# A mask is used to first segment the brain regions from the background, then mean absolute deviation is used to detect outliers
51+
img = nib.load(fname)
52+
img_data = img.get_fdata()
53+
# reshape from 4D to 2D
54+
img_data_2D = np.reshape(img_data, (-1,img_data.shape[-1]))
55+
# segment brain from background
56+
brain_voxels = segment_brain(img_data_2D)
57+
# find the outlying voxels
58+
outliers_voxel = mad_voxel_detector(brain_voxels)
59+
# calculate the number of outlying voxels for each time point
60+
voxel_outliers_per_time = np.nansum(outliers_voxel,axis=0)
61+
# find the outliers in the time-series
62+
outliers_time = mad_time_detector(voxel_outliers_per_time)
63+
# Return indices of True values from Boolean array.
64+
return np.nonzero(outliers_time)[0]
1265

1366

1467
def detect_outliers(fname):
@@ -29,7 +82,7 @@ def detect_outliers(fname):
2982
dvs = dvars(img)
3083
is_outlier = iqr_detector(dvs, iqr_proportion=2)
3184
# Return indices of True values from Boolean array.
32-
return np.nonzero(is_outlier)
85+
return np.nonzero(is_outlier)[0]
3386

3487

3588
def find_outliers(data_directory):
@@ -49,6 +102,7 @@ def find_outliers(data_directory):
49102
image_fnames = Path(data_directory).glob("**/sub-*.nii.gz")
50103
outlier_dict = {}
51104
for fname in image_fnames:
52-
outliers = detect_outliers(fname)
105+
outliers = detect_outliers_mean_absolute_deviation_mask(fname)
106+
#outliers = detect_outliers(fname)
53107
outlier_dict[fname] = outliers
54108
return outlier_dict

0 commit comments

Comments
 (0)