Loading data into MetricLearningDataModule

Hello, 

I am trying to train scimilarity on my own data and have run into an issue loading the data into MetricLearningDataModule. I have set up the paths to my data and gene_file like this:

```python
train_path="/os10/epithelial_reference_atlas_objects/epi_adata_train.zarr"
val_path ="/os10/epithelial_reference_atlas_objects/epi_adata_val.zarr"
gene_order="/os10/epithelial_reference_atlas_objects/gene_order.tsv"
``` 

When I run this in a Jupyter notebook, it returns the two lines at the bottom:

```python
datamodule = MetricLearningDataModule(
    train_path=train_path,
    val_path =val_path,
    gene_order=gene_order,
    obs_field = "level_3_annot", 
    batch_size=1000,
    num_workers=4,
)

0it [00:00, ?it/s]
0it [00:00, ?it/s]
``` 

 datamodule.int2label returns an empty dictionary, 
and datamodule.n_genes returns 36390, which is the number of genes in the dataset I am training on (from gene_order). 

I have confirmed that the .zarr folders are intact and retain the proper zarr file structure.

training the model returns:

```python
model = MetricLearning(n_genes=datamodule.n_genes)

trainer.fit(model, datamodule)

You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name            | Type        | Params | Mode 
--------------------------------------------------------
0 | encoder         | Encoder     | 38.4 M | train
1 | decoder         | Decoder     | 38.5 M | train
2 | triplet_loss_fn | TripletLoss | 0      | train
3 | mse_loss_fn     | MSELoss     | 0      | train
--------------------------------------------------------
76.9 M    Trainable params
0         Non-trainable params
76.9 M    Total params
307.739   Total estimated model params size (MB)
27        Modules in train mode
0         Modules in eval mode

ValueError: num_samples should be a positive integer value, but got num_samples=0
``` 

Meaning the model training is initiated but there is no data to train on.

Can you please provide some advice on how to import .zarr files so that MetricLearningDataModule does not import an empty dictionary?

Thank you so much for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading data into MetricLearningDataModule #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loading data into MetricLearningDataModule #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions