Skip to content

Latest commit

 

History

History
211 lines (157 loc) · 15.5 KB

File metadata and controls

211 lines (157 loc) · 15.5 KB

annsel

Status Build Tests Documentation codecov pre-commit
Meta Hatch project Ruff uv License gitmoji
Package PyPI PyPI
Ecosystem scverse

Annsel is a user-friendly library that brings familiar dataframe-style operations to AnnData objects.

It's built on the narwhals compatibility layer for dataframes.

Take a look at the GitHub Projects board for features and future plans: Annsel Features

Getting started

Please refer to the documentation, in particular, the API documentation.

There's also a brief tutorial on how to use all the features of annsel: All of Annsel.

Installation

You need to have Python 3.10 or newer installed on your system. If you don't have Python installed, we recommend installing uv. There are several ways to install annsel:

  1. Install the most recent release:

    With uv:

    uv add annsel

    With pip:

    pip install annsel
  2. Install the latest development version:

    With uv:

    uv add git+https://github.com/srivarra/annsel

    With pip:

    pip install git+https://github.com/srivarra/annsel.git@main

Examples

annsel comes with a small dataset from Cell X Gene to help you get familiar with the API.

import annsel as an

adata = an.datasets.leukemic_bone_marrow_dataset()

The dataset looks like this:

AnnData object with n_obs × n_vars = 31586 × 458
    obs: 'Cluster_ID', 'donor_id', 'Sample_Tag', 'Cell_label', 'is_primary_data', 'organism_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'assay_ontology_term_id', 'tissue_ontology_term_id', 'Genotype', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'disease_ontology_term_id', 'cell_type_ontology_term_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_is_filtered', 'Unnamed: 0', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
    uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'schema_reference', 'schema_version', 'title'
    obsm: 'X_bothumap', 'X_pca', 'X_projected', 'X_projectedmean', 'X_tsneni', 'X_umapni'

Filter

You can filter on obs, var, var_names, obs_names, X and it's layers, as well as obsm and varm matrices as a key-value pair containing the attribute's key name and the predicate to filter on. Currently the column names are numerical indices for obsm and varm matrices.

adata.an.filter(
    obs=(
        an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"]),
        an.col(["sex"]) == "male",
    ),
    var=an.col(["vst.mean"]) >= 3,
    obsm={"X_pca": an.col([0]) > 0}, # PC1 values greater than 0
    copy=False, # Whether to return a copy of the AnnData object or just a view of it.
)
View of AnnData object with n_obs × n_vars = 736 × 67
    obs: 'Cluster_ID', 'donor_id', 'Sample_Tag', 'Cell_label', 'is_primary_data', 'organism_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'assay_ontology_term_id', 'tissue_ontology_term_id', 'Genotype', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'disease_ontology_term_id', 'cell_type_ontology_term_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_is_filtered', 'Unnamed: 0', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
    uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'schema_reference', 'schema_version', 'title'
    obsm: 'X_bothumap', 'X_pca', 'X_projected', 'X_projectedmean', 'X_tsneni', 'X_umapni'

Select

You can select on obs, var, var_names, obs_names, X and it's layers. Selecting returns a new AnnData object. It's useful if you don't need all the columns in obs or var and just want to work with a few.

adata.an.select(
    obs=an.col(["Cell_label"]),
    var=an.col(["vst.mean", "vst.std"]),
)

Group By

You can group over obs and var columns which returns a generator of objects containing the grouped data and the grouping parameters.

gb_adata_result = adata.an.group_by(
    obs=an.col(["Cell_label"]),
    var=an.col(["feature_type"]),
    copy=False,
)

Here's what the first group looks like:

next(adata.an.group_by(
    obs=an.col(["Cell_label"]),
    copy=False,
))
GroupByAnnData:
  ├── Observations:
  │   └── Cell_label: Lymphomyeloid prog
  ├── Variables:
  │   └── (all variables)
  └── AnnData:
      View of AnnData object with n_obs × n_vars = 913 × 458
          obs: 'Cluster_ID', 'donor_id', 'Sample_Tag', 'Cell_label', 'is_primary_data', 'organism_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'assay_ontology_term_id', 'tissue_ontology_term_id', 'Genotype', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'disease_ontology_term_id', 'cell_type_ontology_term_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
          var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_is_filtered', 'Unnamed: 0', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
          uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'schema_reference', 'schema_version', 'title'
          obsm: 'X_bothumap', 'X_pca', 'X_projected', 'X_projectedmean', 'X_tsneni', 'X_umapni'

Pipe

There's also a small utility method which allows you to chain operations together like in Xarray and Pandas called pipe.

import scanpy as sc
adata.an.pipe(sc.pl.embedding, basis="X_tsneni", color="Cell_label")

Release notes

See the changelog.

Contact

For questions and help requests, you can reach out in the scverse discourse or the discussions tab. If you found a bug, please use the issue tracker.

Citation

Varra, S. R. annsel [Computer software]. https://github.com/srivarra/annsel