Skip to content

KeyError in cell_topic_heatmap [BUG] #209

@daniel-nebdal

Description

@daniel-nebdal

Describe the bug
When running cell_topic_heatmap with the settings from the tutorial, except with our variable name and pdf output, I get a KeyError:

cell_topic_heatmap(
    cistopic_obj,
    variables = ['cell_type'],
    scale = False,
    legend_loc_x = 1.0,
    legend_loc_y = -1.2,
    legend_dist_y = -1,
    figsize = (10, 10), 
    save = f"{outdir}/heatmap_topics.pdf"
)

#Most keys removed for readability, you get the idea
KeyError: "None of [Index(['TCACCTCAGACTAAGG-1-sample2___sample2',\n       'CCTTTAGTCTGTTCAT-1-sample2___sample2',\n    (...),     dtype='object', length=15811)] are in the [columns]"

Looking deeper, I don't see how this could ever have worked - which probably means that I don't quite understand the code.

In cell_topic_heatmap, we have

model = cistopic_obj.selected_model
cell_topic = model.cell_topic_harmony if harmony else model.cell_topic
cell_data = cistopic_obj.cell_data

#At this point, cell_topic has barcodes in the column names, and cell_data has barcodes in the rownames.
# selected_topics, selected_cells and scale are False, so the next thing that happens is:
cell_topic = cell_topic.transpose()

# Now both cell_data and cell_topic has barcodes in the rownames
# variables is not None, remove_nan is True, and there are some NaNs, so:

cell_data = cell_data[variables].dropna()
cell_topic = cell_topic.loc[:, cell_data.index.tolist()]

# This fails because cell_data.index is barcodes, and we're trying to use them 
# to sort the _columns_ of cell_topic - but we have transposed cell_topic so the
# barcodes are in rows now. 
# Should this be cell_topic.loc[cell_data.index.tolist(), :] ?

To Reproduce
As above.

Error output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/clust_vis.py", line 940, in cell_topic_heatmap
    cell_topic = cell_topic.loc[:, cell_data.index.tolist()]
                 ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1068, in __getitem__
    return self._getitem_tuple(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1257, in _getitem_tuple
    return self._getitem_tuple_same_dim(tup)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexing.py", line 925, in _getitem_tuple_same_dim
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1302, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1240, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1433, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6108, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/mlangm/.conda/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6168, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['TCACCTCAGACTAAGG-1-sample2___sample2',\n       'CCTTTAGTCTGTTCAT-1-sample2___sample2',\n       'ACTTAGTCACCCACAG-1-sample2___sample2',\n       'GTTCATTTCGTTAAGC-1-sample2___sample2',\n       'GATGGACAGGAGTCTT-1-sample2___sample2',\n       'CGTGGTTCATCATGTG-1-sample2___sample2',\n       'CTTACCTCAGTAGGTG-1-sample2___sample2',\n       'CCTGAATAGTAACGGA-1-sample2___sample2',\n       'AGTATAGCAAGGAATC-1-sample2___sample2',\n       'GTAGGTGCACTGACCG-1-sample2___sample2',\n       ...\n       'GTGCTCCGTGGTTCTT-1-NLE71_base___NLE71_base',\n       'CAAGTAACACCTATAG-1-NLE71_base___NLE71_base',\n       'ACCGGCTAGTGAACGA-1-NLE71_base___NLE71_base',\n       'GAGGCAAGTTCCTGAT-1-NLE71_base___NLE71_base',\n       'CGTTGCGCATCAGCAC-1-NLE71_base___NLE71_base',\n       'CATAATGTCCTAACGG-1-NLE71_base___NLE71_base',\n       'ACCAGGCTCGGCCAGT-1-NLE71_base___NLE71_base',\n       'GTTAAGTGTCAAACTG-1-NLE71_base___NLE71_base',\n       'GGGTTTCCAAGCGATG-1-NLE71_base___NLE71_base',\n       'TCCTTCAAGCGCCTTT-1-NLE71_base___NLE71_base'],\n      dtype='object', length=15811)] are in the [columns]"

Expected behavior
Well, I'd like a heatmap instead of an error?

Screenshots

>>> cistopic_obj.selected_model.cell_topic.iloc[:2, :2]
        TCACCTCAGACTAAGG-1-sample2___sample2  CCTTTAGTCTGTTCAT-1-sample2___sample2
Topic1                              0.027718                              0.015568
Topic2                              0.464732                              0.236424
>>> cistopic_obj.cell_data.iloc[:2, :2]
                                      nucleosome_signal  barcode_rank
TCACCTCAGACTAAGG-1-sample2___sample2           0.448315           934
CCTTTAGTCTGTTCAT-1-sample2___sample2           0.556827           511
>>> cistopic_obj.cell_data.columns
Index(['nucleosome_signal', 'barcode_rank', 'pycisTopic_leiden_10_1.2',
       'unique_fragments_in_peaks_count', 'cisTopic_nr_frag',
       'pdf_values_for_fraction_of_fragments_in_peaks',
       'pycisTopic_leiden_10_0.6', 'tss_enrichment',
       'total_fragments_in_peaks_count', 'log10_total_fragments_count',
       'pdf_values_for_duplication_ratio', 'cell_type', 'barcode',
       'duplication_ratio', 'log10_unique_fragments_in_peaks_count',
       'duplication_count', 'pdf_values_for_tss_enrichment',
       'cisTopic_log_nr_frag', 'log10_unique_fragments_count',
       'unique_fragments_count', 'cisTopic_nr_acc',
       'log10_total_fragments_in_peaks_count',
       'fraction_of_fragments_in_peaks', 'sample_id', 'total_fragments_count',
       'cisTopic_log_nr_acc', 'sampleid', 'pycisTopic_leiden_10_3'],
      dtype='object')

Version (please complete the following information):

  • python 3.11.11
  • pycisTopic.version = 2.0a0

This is a conda environment with scenicplus; I changed the requires-python line in the scenicplus pyproject.toml to build with 3.11.11.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions