Skip to content

Commit 0befa91

Browse files
author
Benedikt Obermayer
committed
convert from loom (fixes #15); expanded filtering; new PBMC dataset + demo movie; better tests (fixes #21)
1 parent 1d03aa3 commit 0befa91

File tree

12 files changed

+304
-121
lines changed

12 files changed

+304
-121
lines changed

HISTORY.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22
History
33
=======
44

5+
------
6+
v0.7.0
7+
------
8+
9+
- added conversion from .loom files
10+
- cell filtering also supports downsampling
11+
- added PBMC dataset hosted on figshare
12+
- added demo movie
13+
514
------
615
v0.6.0
716
------

README.rst

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,11 @@ SCelVis: Easy Single-Cell Visualization
2323
.. image:: https://zenodo.org/badge/185944510.svg
2424
:target: https://zenodo.org/badge/latestdoi/185944510
2525

26-
You can find the URL for the demo linked to on the top right of the Github repository page.
26+
|
27+
28+
.. image:: scelvis/assets/movie.gif
29+
:height: 400px
30+
:align: center
2731

2832
------------
2933
Installation
@@ -52,12 +56,13 @@ A Docker container is also available via `Quay.io/Biocontainers <https://quay.io
5256
Tutorial
5357
--------
5458

55-
explore a simulated dummy dataset or 1000 cells from a 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (10X v3 chemistry)
59+
explore 1000 cells from a 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (10X v3 chemistry) or a published dataset of ~14000 IFN-beta treated and control PBMCs from 8 donors (`GSE96583 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583>`_; see `Kang et al. <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583>`_)
5660

5761
.. code-block:: shell
5862
59-
$ scelvis run --data-source /path/to/scelvis/examples/dummy.h5ad
6063
$ scelvis run --data-source /path/to/scelvis/examples/hgmm_1k.h5ad
64+
$ scelvis run --data-source https://files.figshare.com/18037739/pbmc.h5ad
65+
6166
6267
and then point your browser to http://0.0.0.0:8050/.
6368

@@ -70,12 +75,14 @@ Data sets are provided as HDF5 files (`anndata <https://anndata.readthedocs.io/e
7075

7176
For the input you can either specify one HDF5 file or a directory containing multiple such files.
7277

73-
You can use ``scanpy`` to create this HDF5 file directly or use the ``scelvis convert`` command for converting your single-cell pipeline output.
78+
You can use `scanpy <http://scanpy.rtfd.io>`_ to create this HDF5 file directly or use the ``scelvis convert`` command for converting your single-cell pipeline output.
7479

7580
HDF5 Input
7681
----------
7782

78-
for HDF5 input, you can do your analysis with `scanpy <http://scanpy.rtfd.io>`_ to create an anndata object ``ad``. SCelVis will use embedding coordinates from ``ad.obsm``, cell annotation from ``ad.obs`` and expression data directly from ``ad.X`` (this should contain normalized and log-transformed expression values for all genes). Information about the dataset will be extracted from strings stored in ``ad.uns['about_title']``, ``ad.uns['about_short_title']`` and ``ad.uns['about_readme']`` (assumed to be Markdown). Information about marker genes will be taken from entries starting with ``marker_`` in ``ad.uns``: entries called ``marker_gene`` (required!), ``marker_cluster``, ``marker_padj``, ``marker_LFC`` will create a table with the columns ``gene``, ``cluster``, ``padj``, and ``LFC``.
83+
for HDF5 input, you can do your analysis with `scanpy <http://scanpy.rtfd.io>`_ to create an anndata object ``ad``. SCelVis will use embedding coordinates from ``ad.obsm``, cell annotation from ``ad.obs`` and expression data directly from ``ad.X`` (this should contain normalized and log-transformed expression values for all genes). If present, information about the dataset will be extracted from strings stored in ``ad.uns['about_title']``, ``ad.uns['about_short_title']`` and ``ad.uns['about_readme']`` (assumed to be Markdown). Information about marker genes will be taken either from the ``rank_genes_groups`` slot in ``ad.uns`` or from entries starting with ``marker_`` in ``ad.uns``: entries called ``marker_gene`` (required!), ``marker_cluster``, ``marker_padj``, ``marker_LFC`` will create a table with the columns ``gene``, ``cluster``, ``padj``, and ``LFC``.
84+
85+
If you prepared your data with ``Seurat`` (v2), you can use ``Convert(from = sobj, to = "anndata", filename = "data.h5ad")`` to get an HDF5 file.
7986

8087
Text Input
8188
----------
@@ -122,7 +129,18 @@ For "raw" text input, you need to prepare at least three files in the input dire
122129
123130
$ scelvis convert --input-dir text_input --output data/text_input.h5ad --about-md text_input.md
124131
125-
in ``examples/dummy_raw.zip`` and ``examples/dummy_about.md`` we provide raw data for the dummy dataset.
132+
in ``examples/dummy_raw.zip`` and ``examples/dummy_about.md`` we provide raw data for a simulated dummy dataset.
133+
134+
Loom Input
135+
----------
136+
137+
for `loompy <http://loompy.org>`_ or `loomR <https://github.com/mojaveazure/loomR>`_ input, you can convert your data like this:
138+
139+
.. code-block:: shell
140+
141+
$ scelvis convert --i input.loom -m markers.tsv -a about.md -o loom_input.h5ad
142+
143+
if you prepared your data with ``Seurat`` (v3), you can use ``as.loom(sobj, filename="output.loom")`` to get a ``.loom`` file and then convert to ``.h5ad`` with the above command.
126144

127145
CellRanger Input
128146
----------------
@@ -142,7 +160,7 @@ Alternatively, the output directory of ``CellRanger`` can be used. This is the d
142160
EOF
143161
$ scelvis convert --input-dir cellranger-out --output data/cellranger_input.h5ad --about-md cellranger.md
144162
145-
In ``examples/hgmm_1k_raw.zip`` we provide ``CellRanger`` output for the 1k 1:1 human mouse mix. Specifically, from the `outs` folder we selected
163+
In ``examples/hgmm_1k_raw`` we provide ``CellRanger`` output for the 1k 1:1 human mouse mix. Specifically, from the ``outs`` folder we selected
146164
147165
- ``filtered_feature_bc_matrix.h5``
148166
- tSNE and PCA projections from ``analysis/tsne`` and ``analysis/pca``

requirements/base.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ numpy
1414
pandas
1515
anndata
1616
scanpy
17+
loompy
1718

1819
# Caching functionality for Flask.
1920
flask-caching

scelvis/app.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -145,10 +145,16 @@ def find(name, path):
145145
logger.info("Looking for %s file", cellranger_needle)
146146
needle_path = find(cellranger_needle, tmpdir)
147147
if needle_path is None:
148-
raw_needle = "coords.tsv"
149-
logger.info("Looking for %s file", raw_needle)
150-
needle_path = find(raw_needle, tmpdir)
151-
format_ = "text"
148+
text_needle = "coords.tsv"
149+
logger.info("Looking for %s file", text_needle)
150+
needle_path = find(text_needle, tmpdir)
151+
if needle_path is None:
152+
loom_needle = "data.loom"
153+
logger.info("Looking for %s file", loom_needle)
154+
needle_path = find(loom_needle, tmpdir)
155+
format_ = "loom"
156+
else:
157+
format_ = "text"
152158
else:
153159
format_ = "cell-ranger"
154160
input_dir = os.path.dirname(needle_path)
@@ -183,7 +189,8 @@ def find(name, path):
183189
return """
184190
<!doctype html>
185191
<title>Convert File</title>
186-
<h1>Upload ZIP or TAR.GZ of CellRanger Output</h1>
192+
<h1>Upload ZIP or TAR.GZ of your data</h1>
193+
<p>either containing CellRanger output, raw text files or a data.loom file<p>
187194
<p>
188195
The server will return a <tt>.h5a</tt> file that you can upload into the SCelVis visualization.
189196
</p>

scelvis/assets/cells.png

88.8 KB
Loading

scelvis/assets/movie.gif

7.28 MB
Loading

scelvis/callbacks.py

Lines changed: 115 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -477,6 +477,12 @@ def toggle_filter_cells_controls(n, is_open):
477477
def register_update_filter_cells_controls(app, token):
478478
@app.callback(
479479
[
480+
Output("%s_filter_cells_ncells_div" % token, "style"),
481+
Output("%s_filter_cells_ncells" % token, "marks"),
482+
Output("%s_filter_cells_ncells" % token, "min"),
483+
Output("%s_filter_cells_ncells" % token, "max"),
484+
Output("%s_filter_cells_ncells" % token, "value"),
485+
Output("%s_filter_cells_ncells" % token, "step"),
480486
Output("%s_filter_cells_choice_div" % token, "style"),
481487
Output("%s_filter_cells_choice" % token, "options"),
482488
Output("%s_filter_cells_choice" % token, "value"),
@@ -493,56 +499,73 @@ def register_update_filter_cells_controls(app, token):
493499
def update_filter_cells_controls(pathname, attribute, filters_json):
494500
_, kwargs = get_route(pathname)
495501
data = store.load_data(kwargs.get("dataset"))
502+
hidden_slider = ({"display": "none"}, {0: "0", 1: "1"}, 0, 1, 1, 0)
503+
hidden_checklist = ({"display": "none"}, [], None)
504+
hidden_rangeslider = ({"display": "none"}, {0: "0", 1: "1"}, 0, 1, [0, 1], 0)
505+
496506
if attribute is None or attribute == "None":
497-
return (
498-
{"display": "none"},
499-
[],
500-
None,
501-
{"display": "none"},
502-
{0: "0", 1: "1"},
503-
0,
504-
1,
505-
[0, 1],
506-
0,
507-
)
507+
return hidden_slider + hidden_checklist + hidden_rangeslider
508508
filters = json.loads(filters_json)
509-
values = data.ad.obs_vector(attribute)
510-
if not pd.api.types.is_numeric_dtype(values):
511-
categories = list(data.ad.obs[attribute].cat.categories)
512-
return (
513-
{"display": "block"},
514-
[{"label": v, "value": v} for v in categories],
515-
filters[attribute] if attribute in filters else categories,
516-
{"display": "none"},
517-
{0: "0", 1: "1"},
518-
0,
519-
1,
520-
[0, 1],
521-
0,
522-
)
523-
else:
524-
range_min = values.min()
525-
range_max = values.max()
509+
if attribute == "ncells":
510+
ncells_tot = data.ad.obs.shape[0]
526511
if attribute in filters:
527-
val_min = filters[attribute][0]
528-
val_max = filters[attribute][1]
512+
ncells_selected = filters[attribute]
529513
else:
530-
val_min = range_min
531-
val_max = range_max
514+
ncells_selected = ncells_tot
532515
return (
533-
{"display": "none"},
534-
[],
535-
None,
536-
{"display": "block"},
537-
dict(
538-
(int(t) if t % 1 == 0 else t, "{0:g}".format(t))
539-
for t in ui.common.auto_tick([range_min, range_max], max_tick=4, tf_inside=True)
540-
),
541-
range_min,
542-
range_max,
543-
[val_min, val_max],
544-
(range_max - range_min) / 1000,
516+
(
517+
{"display": "block"},
518+
dict(
519+
(int(t) if t % 1 == 0 else t, "{0:g}".format(t))
520+
for t in ui.common.auto_tick([0, ncells_tot], max_tick=4, tf_inside=True)
521+
),
522+
0,
523+
ncells_tot,
524+
ncells_selected,
525+
ncells_tot / 1000,
526+
)
527+
+ hidden_checklist
528+
+ hidden_rangeslider
545529
)
530+
else:
531+
values = data.ad.obs_vector(attribute)
532+
if not pd.api.types.is_numeric_dtype(values):
533+
categories = list(data.ad.obs[attribute].cat.categories)
534+
return (
535+
hidden_slider
536+
+ (
537+
{"display": "block"},
538+
[{"label": v, "value": v} for v in categories],
539+
filters[attribute] if attribute in filters else categories,
540+
)
541+
+ hidden_rangeslider
542+
)
543+
else:
544+
range_min = values.min()
545+
range_max = values.max()
546+
if attribute in filters:
547+
val_min = filters[attribute][0]
548+
val_max = filters[attribute][1]
549+
else:
550+
val_min = range_min
551+
val_max = range_max
552+
return (
553+
hidden_slider
554+
+ hidden_checklist
555+
+ (
556+
{"display": "block"},
557+
dict(
558+
(int(t) if t % 1 == 0 else t, "{0:g}".format(t))
559+
for t in ui.common.auto_tick(
560+
[range_min, range_max], max_tick=4, tf_inside=True
561+
)
562+
),
563+
range_min,
564+
range_max,
565+
[val_min, val_max],
566+
(range_max - range_min) / 1000,
567+
)
568+
)
546569

547570

548571
def register_update_filter_cells_filters(app):
@@ -554,9 +577,11 @@ def register_update_filter_cells_filters(app):
554577
],
555578
[
556579
Input("url", "pathname"),
580+
Input("meta_filter_cells_ncells", "value"),
557581
Input("meta_filter_cells_choice", "value"),
558582
Input("meta_filter_cells_range", "value"),
559583
Input("meta_filter_cells_reset", "n_clicks"),
584+
Input("expression_filter_cells_ncells", "value"),
560585
Input("expression_filter_cells_choice", "value"),
561586
Input("expression_filter_cells_range", "value"),
562587
Input("expression_filter_cells_reset", "n_clicks"),
@@ -569,9 +594,11 @@ def register_update_filter_cells_filters(app):
569594
)
570595
def update_filter_cells_filters(
571596
pathname,
597+
meta_ncells_value,
572598
meta_cat_value,
573599
meta_range_value,
574600
meta_reset_n,
601+
expression_ncells_value,
575602
expression_cat_value,
576603
expression_range_value,
577604
expression_reset_n,
@@ -584,26 +611,43 @@ def update_filter_cells_filters(
584611
ctx = dash.callback_context
585612

586613
filters = json.loads(filters_json)
614+
active_filters = set()
615+
# if reset button was hit, remove entries in filters_json
616+
attributes = list(filters.keys())
587617
status = "active filters: "
588-
# if reset button was hit, check all boxes using stored values in filters_json
589-
attributes = filters.keys()
590618
if ctx.triggered and "reset" in ctx.triggered[0]["prop_id"]:
591-
for attribute in list(attributes):
619+
for attribute in attributes:
592620
del filters[attribute]
593621
return (json.dumps(filters), status, status)
594622

595-
for cat_value, range_value, attribute in [
596-
(meta_cat_value, meta_range_value, meta_attribute),
597-
(expression_cat_value, expression_range_value, expression_attribute),
623+
# else update filters_json depending on inputs
624+
for ncells_value, cat_value, range_value, attribute in [
625+
(meta_ncells_value, meta_cat_value, meta_range_value, meta_attribute),
626+
(
627+
expression_ncells_value,
628+
expression_cat_value,
629+
expression_range_value,
630+
expression_attribute,
631+
),
598632
]:
599633
if attribute is not None and attribute != "None":
600-
values = data.ad.obs_vector(attribute)
601-
if not pd.api.types.is_numeric_dtype(values):
602-
filters[attribute] = sorted(cat_value)
634+
if attribute == "ncells":
635+
filters[attribute] = ncells_value
636+
ncells_tot = data.ad.obs.shape[0]
637+
if ncells_value < ncells_tot:
638+
active_filters.add(attribute)
603639
else:
604-
filters[attribute] = range_value
605-
606-
status += ", ".join(attributes)
640+
values = data.ad.obs_vector(attribute)
641+
if not pd.api.types.is_numeric_dtype(values):
642+
filters[attribute] = cat_value
643+
if cat_value is not None and set(cat_value) != set(values):
644+
active_filters.add(attribute)
645+
else:
646+
filters[attribute] = range_value
647+
if range_value[0] > values.min() or range_value[1] < values.max():
648+
active_filters.add(attribute)
649+
650+
status += ", ".join(active_filters)
607651
return (json.dumps(filters), status, status)
608652

609653

@@ -623,19 +667,23 @@ def activate_filter_cells_reset(pathname, filters_json):
623667
else:
624668
filters = {}
625669
disabled = True
626-
attributes = filters.keys()
627-
for attribute in attributes:
628-
values = data.ad.obs_vector(attribute)
629-
if not pd.api.types.is_numeric_dtype(values):
630-
if filters[attribute] != list(data.ad.obs[attribute].cat.categories):
670+
for attribute, selected in filters.items():
671+
if attribute == "ncells":
672+
ncells_tot = data.ad.obs.shape[0]
673+
if selected < ncells_tot:
631674
disabled = False
632675
else:
633-
range_min = values.min()
634-
range_max = values.max()
635-
val_min = filters[attribute][0]
636-
val_max = filters[attribute][1]
637-
if val_min > range_min or val_max < range_max:
638-
disabled = False
676+
values = data.ad.obs_vector(attribute)
677+
if not pd.api.types.is_numeric_dtype(values):
678+
if sorted(selected) != sorted(data.ad.obs[attribute].cat.categories):
679+
disabled = False
680+
else:
681+
range_min = values.min()
682+
range_max = values.max()
683+
val_min = selected[0]
684+
val_max = selected[1]
685+
if val_min > range_min or val_max < range_max:
686+
disabled = False
639687

640688
return (disabled, disabled)
641689

0 commit comments

Comments
 (0)