Skip to content

Significant variation in GRNBoost2 results with minor cell subsampling (removing one cell) in pySCENIC #623

@jklupup

Description

@jklupup

When running the GRN step in pySCENIC, I observed substantial differences in the output adjacencies.csv after removing just one cell from the expression matrix. Specifically:

Using the ​​full expression matrix​​ (e.g., thousands of cells) vs. a matrix ​​missing one cell​​ yields only ​​56.21% overlap in TF-target pairs​​.
This level of variability seems unexpectedly high for a dataset of this scale.

I wonder if it's something wrong with my code

Code:

if [ ! -f grn.SUCCESS ]; then
    arboreto_with_multiprocessing.py \
      $count_loom \
      $tf_list \
      --num_workers 16 \
      --output adjacencies.csv \
      --method grnboost2 \
      --sparse \
      --seed 1 \
    && touch grn.SUCCESS
fi

if [ ! -f grn.SUCCESS ]; then echo "grn error"; exit 1; fi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions