When running the GRN step in pySCENIC, I observed substantial differences in the output adjacencies.csv after removing just one cell from the expression matrix. Specifically:
Using the full expression matrix (e.g., thousands of cells) vs. a matrix missing one cell yields only 56.21% overlap in TF-target pairs.
This level of variability seems unexpectedly high for a dataset of this scale.
I wonder if it's something wrong with my code
Code:
if [ ! -f grn.SUCCESS ]; then
arboreto_with_multiprocessing.py \
$count_loom \
$tf_list \
--num_workers 16 \
--output adjacencies.csv \
--method grnboost2 \
--sparse \
--seed 1 \
&& touch grn.SUCCESS
fi
if [ ! -f grn.SUCCESS ]; then echo "grn error"; exit 1; fi