Skip to content

Commit 136ae6b

Browse files
tony-kuojavh
andauthored
Training refactor (#37)
* release candidate initial commit * fix docstrings and formatting --------- Co-authored-by: Jason Vander Heiden <[email protected]>
1 parent 85efa9a commit 136ae6b

30 files changed

+2534
-757
lines changed

.github/workflows/docker.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
- name: Build and push Docker image
3535
uses: docker/build-push-action@v4
3636
with:
37-
context: ./docker
37+
context: .
3838
push: true
3939
tags: ${{ steps.meta.outputs.tags }}
4040
labels: ${{ steps.meta.outputs.labels }}

.github/workflows/pages.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,4 @@ jobs:
4848
steps:
4949
- name: Deploy to GitHub Pages
5050
id: deployment
51-
uses: actions/deploy-pages@v4
51+
uses: actions/deploy-pages@v4

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
exclude .gitignore
22
recursive-exclude .github *
33
recursive-exclude docker *
4+
recursive-exclude scripts *

NEWS.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,29 @@
11
Release Notes
22
================================================================================
33

4+
Version 0.4.0: May 05, 2025
5+
--------------------------------------------------------------------------------
6+
7+
General:
8+
+ A new training tutorial has been added which describes the new training
9+
workflow. This includes data preparation, training, and post-training data
10+
structures using the new scripts.
11+
12+
Training:
13+
+ A new training workflow has been added to use CellArr, a TileDB based
14+
framework, as the data store to streamline the end-to-end process. This
15+
replaces the old Zarr based workflows.
16+
+ New data loaders and samplers for CellArr data have been added in the
17+
``tiledb_data_models`` module.
18+
+ A example training script has been added to show how to train models as
19+
``scripts/train.py``.
20+
+ New scripts for creating all post-training data structures have been added
21+
in the folder ``scripts``.
22+
+ New utility methods that make use of the CellArr store:
23+
``utils.query_tiledb_df`` to query a tiledb dataframe,
24+
``utils.adata_from_tiledb`` to extract cells from the tiledb stores based on
25+
index, including raw counts.
26+
427
Version 0.3.0: November 19, 2024
528
--------------------------------------------------------------------------------
629

docker/Dockerfile

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM fedora:38
1+
FROM fedora:42
22
LABEL maintainer="Jason Anthony Vander Heiden [[email protected]]" \
33
org.opencontainers.image.description="SCimilarity" \
44
org.opencontainers.image.source="https://github.com/genentech/scimilarity"
@@ -10,7 +10,7 @@ VOLUME /workspace
1010
VOLUME /scratch
1111

1212
# Tools
13-
COPY start-notebook.sh /usr/local/bin/start-notebook
13+
COPY docker/start-notebook.sh /usr/local/bin/start-notebook
1414

1515
# Environment
1616
ENV SCDATA_HOME=/data
@@ -28,13 +28,13 @@ RUN dnf -y update && dnf install -y \
2828
python3 \
2929
python3-aiohttp \
3030
python3-asciitree \
31-
python3-bash-kernel \
3231
python3-biopython \
3332
python3-cloudpickle \
3433
python3-Cython \
3534
python3-numcodecs \
3635
python3-dask \
3736
python3-dask+array \
37+
python3-devel \
3838
python3-fasteners \
3939
python3-GitPython \
4040
python3-h5py \
@@ -74,6 +74,8 @@ RUN pip3 install \
7474
scikit-misc \
7575
numba \
7676
tiledb \
77+
tiledb-cloud \
78+
tiledb-vector-search \
7779
leidenalg \
7880
louvain \
7981
umap-learn \
@@ -83,7 +85,8 @@ RUN pip3 install \
8385
captum \
8486
torch \
8587
pytorch-lightning \
86-
scanpy
88+
scanpy \
89+
scrublet
8790

8891
# Install SCimilarity API
8992
RUN git clone https://github.com/Genentech/scimilarity.git /tmp/scimilarity \

docs/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ API Reference
3131

3232
modules/anndata_data_models
3333
modules/nn_models
34+
modules/tiledb_data_models
3435
modules/training_models
3536
modules/triplet_selector
3637
modules/zarr_data_models
@@ -69,6 +70,7 @@ support for these training tasks.
6970

7071
* :mod:`scimilarity.anndata_data_models`
7172
* :mod:`scimilarity.nn_models`
73+
* :mod:`scimilarity.tiledb_data_models`
7274
* :mod:`scimilarity.training_models`
7375
* :mod:`scimilarity.triplet_selector`
7476
* :mod:`scimilarity.zarr_data_models`

docs/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ You can download the following pretrained models for use with SCimilarity from
4848
Zenodo:
4949
https://zenodo.org/records/10685499
5050

51+
5152
Conda environment setup
5253
--------------------------------------------------------------------------------
5354

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
scimilarity.tiledb_data_models
2+
--------------------------------------------------------------------------------
3+
4+
.. automodule:: scimilarity.tiledb_data_models
5+
:members:
6+
:show-inheritance:

0 commit comments

Comments
 (0)