Skip to content

Annotation fails on sparse h5ad file if model trained with with_mean = False #159

@danielsf

Description

@danielsf

If you train a model with the with_mean kwarg set to False and then try to annotate an h5ad file stored as a sparse matrix, you get the following failure

Traceback (most recent call last):
  File "/Users/scott.daniel/KnowledgeEngineering/garage/celltypist_error/show_error.py", line 72, in <module>
    celltypist.annotate(
  File "/Users/scott.daniel/miniconda3/envs/celltypist/lib/python3.12/site-packages/celltypist/annotate.py", line 85, in annotate
    predictions = clf.celltype(mode = mode, p_thres = p_thres)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/scott.daniel/miniconda3/envs/celltypist/lib/python3.12/site-packages/celltypist/classifier.py", line 374, in celltype
    self.indata[self.indata > 10] = 10
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
TypeError: 'coo_matrix' object does not support item assignment

The failure does not occur if you try to annotate an h5ad file saved as a dense array.

The code below should recreate the bug. I am running version 1.7.1 of celltypist and version 0.12.2 of anndata.

import celltypist
import subprocess
import pathlib

# download a model (otherwise, celltypist will try to download
# ALL available models, even if you are using your own model)
celltypist.models.download_models(
    model='Immune_All_Low.pkl'
)


training_data = "demo_2000_cells.h5ad"
test_data = "demo_400_cells.h5ad"


# download example data
if not pathlib.Path(training_data).exists():
    p = subprocess.Popen(
        ["wget", 
         "https://celltypist.cog.sanger.ac.uk/Notebook_demo_data/demo_2000_cells.h5ad"]
    )
    p.wait()


if not pathlib.Path(test_data).exists():
    p = subprocess.Popen(
       ["wget", 
         "https://celltypist.cog.sanger.ac.uk/Notebook_demo_data/demo_400_cells.h5ad"]
    )
    p.wait()

assert pathlib.Path(training_data).is_file()
assert pathlib.Path(test_data).is_file()

print("=======TRAINING with with_mean=True; SHOULD WORK")
model_path = "with_mean_true.pkl"
model = celltypist.train(
    training_data,
    labels='cell_type',
    n_jobs=4,
    feature_selection=True,
    use_SGD=True,
    mini_batch=True,
    with_mean=True
)
model.write(model_path)

print("=======ANNOTATING")

celltypist.annotate(
    test_data,
    model=model_path
)

print("=======SUCCESS")

print("=======TRAINING with with_mean=False; WILL FAIL")
model_path = "with_mean_false.pkl"
model = celltypist.train(
    training_data,
    labels='cell_type',
    n_jobs=4,
    feature_selection=True,
    use_SGD=True,
    mini_batch=True,
    with_mean=False
)
model.write(model_path)

print("=======ANNOTATING")

celltypist.annotate(
    test_data,
    model=model_path
)

print("SUCCESS")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions