Skip to content

Commit fdb5ea4

Browse files
authored
Merge pull request #9 from PytorchConnectomics/add_subfolder
add ves_analysis repo as subfolder
2 parents 1e3b295 + f0aac75 commit fdb5ea4

28 files changed

+4441
-0
lines changed

ves_analysis/LICENSE.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Micaela Aeyoung Roth
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

ves_analysis/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# ves_analysis
2+
3+
## Instructions for running sample data
4+
5+
#### Given files in `sample/sample_data`:
6+
- `7-13_mask.h5` mask of two adjacent neuron pieces (small chunk, not full neurons)
7+
- Note: since we are using a small chunk of data in a neuron adjacency region instead of storing each neuron as an individual file, there is no stitching step for the sample.
8+
9+
- `7-13_pred_filtered.h5` large vesicles within the target neuron (segid 62) in this sample region
10+
- `7-13_lv_label.txt` vesicle type classifications
11+
12+
#### Generate metadata
13+
Run `python -i metadata_and_kdtree/kdTreeMeta.py --which_neurons "sample"`
14+
15+
→ Metadata will be saved to `sample/sample_outputs/sample_com_mapping.txt`
16+
17+
#### Export statistics
18+
Run `python -i updated_export_stats.py --which_neurons “sample”`
19+
20+
→ Volume/diameter stats and list exports will be saved to `sample/sample_outputs/` in multiple files
21+
22+
#### Generate thresholds
23+
Run `python -i lv_thresholds.py --which_neurons “sample”`
24+
25+
→ Thresholds spreadsheet will be saved to `sample/sample_outputs/lv_thresholds.xlsx`
26+
27+
#### Generate pointcloud counts (note segid of target neuron is 62)
28+
Run `python -i vesicle_counts/pointcloud_near_counts.py --which_neurons "sample" --target_segid 62 --lv_threshold [lv threshold] --cv_threshold [cv threshold] --dv_threshold [dv threshold] --dvh_threshold [dvh threshold]` (replacing thresholds with those found from sheet exported in previous step)
29+
30+
→ Near neuron counts spreadsheet will be saved to `sample/sample_outputs/sample_near_counts.xlsx`
31+
32+
33+
34+
## Analysis scripts
35+
36+
### metadata_and_kdtree
37+
- `kdTreeMeta.py`: Convert format of dataset from binary mask to point cloud for each neuron, storing as a list of local coordinates of vesicle COMs and calculate statistics to store as corresponding attributes. Finally, construct kdtree from point cloud for density map, and export point cloud + metadata information into txt format for easy readability to dictionary format.
38+
39+
### neuron_stitching
40+
- `stitching_new.py`: Stitching together all pieces of adjacent neurons into the bounding box of a target neuron using global coordinate offsets; stitching two at a time to reduce memory consumption.
41+
42+
### vesicle_counts
43+
- `pointcloud_soma_counts.py`: Easily generalizable method to return the number of vesicles within a neuronal region given as a binary mask (in this case, the somas of the neurons). Uses extracted values for each vesicle COM coordinate from the mask of the neuron region to determine which vesicles are within the region.
44+
- `pointcloud_near_counts.py`: Extracts “near neuron” regions of interest via Euclidean Distance Transform for adjacent neuron pieces from stitched files (see `neuron_stitching/stitching_new.py`). Uses different thresholds for each vesicle type (see `vesicle_stats/lv_diameters.py`), then counts the number of vesicles of each type within their corresponding regions of interest, using the aforementioned method from `pointcloud_soma_counts.py`).
45+
- `LV_type_counts.py, SV_type_counts.py`: Given exported lists (from `pointcloud_near_counts.py`) of segmentation IDs of vesicles within regions of interest and segID to type mappings, returns vesicle counts separated by type. Allows for adaptability and efficiency in cases of changes to type classifications.
46+
- `slow_counts.py`: Slow, inefficient method—ignore if using point cloud metadata pipeline for vesicles. Manually counts the number of overlapping vesicles in a given region using vectorized binary mask operations (if both neurons and vesicles in form of binary masks).
47+
48+
### neuron_stats
49+
- `surface_area.py`: Computes the surface area for a given binary mask of a neuron by generating a 1 voxel border using Euclidean distance transform and calculating the volume of the border region.
50+
51+
### vesicle_stats
52+
- `updated_extract_stats.py`: Extract statistics from point cloud metadata txt format (`metadata_and_kdtree/metadata_and_kdtree.py`), save as a pandas dataframe, and export into spreadsheet format. Also export lists of volumes and diameters as txt files.
53+
- `lv_diameters.py`: Finding diameter-based thresholds for near neuron counts, to be used in `vesicle_counts/pointcloud_near_counts.py`.
54+
- `vesicle_volume_stats.py`: For calculating and exporting vesicle volumes only (not useful if using point cloud metadata format).
55+
- `LUX2_density.py`: Calculating more specific stats for a particular region of interest within the LUX2 neuron.
56+
57+
## Alternate visualization scripts
58+
Currently UNUSED but functional alternate visualization methods, see `vesicleEM/ves_vis` for final visualization methods which are in use.
59+
60+
### neuroglancer_heatmap
61+
- `heatmap.py`: Uses a Gaussian filter to calculate a heatmap image for vesicles within a given neuron, for visualization in Neuroglancer.
62+
- `ng_heatmap_vis.py`: Neuroglancer rendering script to display a full dataset heatmap (from individual neuron heatmap images created using `heatmap.py`), assembled together by projecting into 2D and using coordinate offsets. Normalizing values and using a 4th color channel along with necessary rotations to align files for later color visualization.
63+
- `ng_shader_script.glsl`: Shader script to plug into Neuroglancer visualization in order to render a gradient of colors based on values from 0.0 to 1.0.
64+
65+
### neuroglancer_types_map
66+
- `types_visualization.py`: Generates “color coded” vesicle mask files given segmentation ID to type mappings by relabeling each vesicle to indicate its type.
67+
- `types_ng.py`: Neuroglancer rendering script to display the full dataset of vesicles color coded by type, using previously generated mask files and using the offset feature to align neurons accurately.
68+
69+
### threshold_density_map
70+
- `color_new.py`: Generates a heatmap using a Gaussian filter, then classifies/colors vesicles according to their density value through three thresholds as a simpler method if a continuous heatmap is not necessary.
71+
- `color_new_ng.py`: Neuroglancer rendering script for thresholded heatmaps generated in `color_new.py`.
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
#author https://github.com/akgohain
2+
3+
import os
4+
import gc
5+
import math
6+
import h5py
7+
import numpy as np
8+
from tqdm import tqdm
9+
import argparse
10+
from scipy import spatial
11+
from concurrent.futures import ProcessPoolExecutor, as_completed
12+
13+
chunk_size = 25 #25
14+
15+
ds_factor = 1
16+
17+
voxel_dims = (30, 8 * ds_factor, 8 * ds_factor)
18+
19+
sample_dir = '/home/rothmr/hydra/sample/'
20+
21+
22+
#inpute into args to compute for all neurons
23+
names_20 = ["KR4", "KR5", "KR6", "SHL55", "PN3", "LUX2", "SHL20", "KR11", "KR10",
24+
"RGC2", "KM4", "NET12", "NET10", "NET11", "PN7", "SHL18",
25+
"SHL24", "SHL28", "RGC7", "SHL17"]
26+
27+
28+
def process_chunk_range(params):
29+
start, end, file_path, ds_factor, name = params
30+
chunk_stats = {}
31+
with h5py.File(file_path, 'r', swmr=True) as f:
32+
dset = f["main"]
33+
34+
chunk = dset[start:end, ::ds_factor, ::ds_factor]
35+
unique_labels = np.unique(chunk)
36+
for label in unique_labels:
37+
if label == 0:
38+
continue
39+
mask = (chunk == label)
40+
coords = np.argwhere(mask)
41+
if coords.size == 0:
42+
continue
43+
coords[:, 0] += start
44+
sum_coords = coords.sum(axis=0)
45+
count = coords.shape[0]
46+
if label in chunk_stats:
47+
chunk_stats[label]['sum'] += sum_coords
48+
chunk_stats[label]['count'] += count
49+
else:
50+
chunk_stats[label] = {'sum': sum_coords, 'count': count}
51+
del chunk
52+
gc.collect()
53+
return chunk_stats
54+
55+
def process_chunks(file_path, ds_factor, chunk_size, name, num_workers=None):
56+
chunk_stats = {}
57+
with h5py.File(file_path, 'r', swmr=True) as f:
58+
full_shape = f["main"].shape
59+
60+
chunk_ranges = []
61+
for start in range(0, full_shape[0], chunk_size):
62+
end = min(start + chunk_size, full_shape[0])
63+
chunk_ranges.append((start, end, file_path, ds_factor, name))
64+
65+
with ProcessPoolExecutor(max_workers=num_workers) as executor:
66+
futures = [executor.submit(process_chunk_range, params) for params in chunk_ranges]
67+
for future in tqdm(as_completed(futures), total=len(futures), desc="Processing chunks", ncols=100):
68+
result = future.result()
69+
for label, data in result.items():
70+
if label in chunk_stats:
71+
chunk_stats[label]['sum'] += data['sum']
72+
chunk_stats[label]['count'] += data['count']
73+
else:
74+
chunk_stats[label] = data
75+
return chunk_stats
76+
77+
78+
def compute_metadata(chunk_stats, voxel_dims):
79+
"""
80+
Computes the center-of-mass (COM), volume, and radius for each vesicle.
81+
The volume is computed as: voxel_count * (30 * 64 * 64)
82+
The radius (in nm) is computed using the micaela/shulin's method:
83+
sqrt((voxel_count * (64*64)) / pi)
84+
"""
85+
metadata = {}
86+
for label, stats in chunk_stats.items():
87+
com = stats['sum'] / stats['count']
88+
volume_nm = stats['count'] * (voxel_dims[0] * voxel_dims[1] * voxel_dims[2])
89+
radius_nm = math.sqrt((stats['count'] * (voxel_dims[1] * voxel_dims[2])) / math.pi)
90+
metadata[label] = {
91+
'com': com,
92+
'volume_nm': volume_nm,
93+
'radius_nm': radius_nm
94+
}
95+
return metadata
96+
97+
def compute_density(metadata, voxel_dims, kd_radius=500):
98+
"""
99+
Computes local vesicle density using a KDTree.
100+
COM coordinates are converted from voxel indices to physical units (nm) and
101+
then the density is calculated as: frequency / (kd_radius^2)
102+
The density values are also normalized.
103+
"""
104+
com_list = []
105+
labels = []
106+
for label, data in metadata.items():
107+
com = data['com']
108+
# Convert voxel indices to physical coordinates (nm)
109+
physical_com = np.array([
110+
com[0] * voxel_dims[0],
111+
com[1] * voxel_dims[1],
112+
com[2] * voxel_dims[2]
113+
])
114+
com_list.append(physical_com)
115+
labels.append(label)
116+
117+
print("total num of ves: ", len(com_list))
118+
com_array = np.array(com_list)
119+
120+
121+
# Build the KDTree and query neighbors within kd_radius (nm)
122+
tree = spatial.KDTree(com_array)
123+
neighbors = tree.query_ball_tree(tree, kd_radius)
124+
frequency = np.array([len(n) for n in neighbors])
125+
density = frequency / (kd_radius ** 2)
126+
127+
# Normalize the density values
128+
min_density = np.min(density)
129+
max_density = np.max(density)
130+
if max_density > min_density:
131+
normalized_density = (density - min_density) / (max_density - min_density)
132+
else:
133+
normalized_density = density
134+
135+
# Add density info to metadata
136+
for i, label in enumerate(labels):
137+
metadata[label]['density'] = density[i]
138+
metadata[label]['normalized_density'] = normalized_density[i]
139+
return metadata
140+
141+
def process_vesicle_data(name, vesicle_type="lv"):
142+
"""
143+
Processes vesicle data (either 'lv' or 'sv') using sequential chunking.
144+
Computes COM, volume, radius, and density via KDTree.
145+
The metadata is written out to a text file.
146+
147+
Note: Re-enumerates labels to ensure uniqueness across LV and SV datasets.
148+
"""
149+
file_prefix = "vesicle_big_" if vesicle_type == "lv" else "vesicle_small_"
150+
file_path = f"/data/projects/weilab/dataset/hydra/results/{file_prefix}{name}_30-8-8.h5"
151+
152+
#if sample
153+
if(name=="sample"):
154+
if(vesicle_type == "lv"): #should only be lv
155+
file_path = f"{sample_dir}sample_data/7-13_pred_filtered.h5"
156+
157+
print(f"Starting chunked processing for {vesicle_type} data of {name}...")
158+
chunk_stats = process_chunks(file_path, ds_factor, chunk_size, name)
159+
metadata = compute_metadata(chunk_stats, voxel_dims)
160+
metadata = compute_density(metadata, voxel_dims, kd_radius=500)
161+
162+
# Re-enumerate labels to ensure uniqueness by prefixing with vesicle type
163+
unique_metadata = {}
164+
for label, data in metadata.items():
165+
new_label = f"{vesicle_type}_{label}"
166+
unique_metadata[new_label] = data
167+
metadata = unique_metadata
168+
169+
# Ensure output directory exists
170+
output_dir = f"metadata/{name}/"
171+
os.makedirs(output_dir, exist_ok=True)
172+
173+
output_file = f"metadata/{name}/{name}_{vesicle_type}_com_mapping.txt"
174+
175+
if(name=="sample"):
176+
output_file = f"{sample_dir}sample_outputs/sample_com_mapping.txt"
177+
178+
with open(output_file, "w") as f:
179+
for label, data in metadata.items():
180+
f.write(f"{data['com']}: ('{vesicle_type}', {label}, {data['volume_nm']}, {data['radius_nm']}, {data['density']}, {data['normalized_density']})\n")
181+
print(f"Chunked processing complete for {vesicle_type} data of {name}!")
182+
return metadata
183+
184+
if __name__ == "__main__":
185+
186+
parser = argparse.ArgumentParser()
187+
parser.add_argument("--which_neurons", type=str, help="all or sample?") #enter as "all" or "sample"
188+
args = parser.parse_args()
189+
which_neurons = args.which_neurons
190+
191+
if(which_neurons=="sample"):
192+
lv_metadata = process_vesicle_data("sample", vesicle_type="lv")
193+
#only need LV for the near counts example pipeline
194+
195+
elif(which_neurons=="all"):
196+
for name in names_20:
197+
lv_metadata = process_vesicle_data(name, vesicle_type="lv")
198+
sv_metadata = process_vesicle_data(name, vesicle_type="sv")
199+
200+
201+
202+

0 commit comments

Comments
 (0)