-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hello,
I am trying to train scimilarity on my own data and have run into an issue loading the data into MetricLearningDataModule. I have set up the paths to my data and gene_file like this:
train_path="/os10/epithelial_reference_atlas_objects/epi_adata_train.zarr"
val_path ="/os10/epithelial_reference_atlas_objects/epi_adata_val.zarr"
gene_order="/os10/epithelial_reference_atlas_objects/gene_order.tsv"When I run this in a Jupyter notebook, it returns the two lines at the bottom:
datamodule = MetricLearningDataModule(
train_path=train_path,
val_path =val_path,
gene_order=gene_order,
obs_field = "level_3_annot",
batch_size=1000,
num_workers=4,
)
0it [00:00, ?it/s]
0it [00:00, ?it/s]datamodule.int2label returns an empty dictionary,
and datamodule.n_genes returns 36390, which is the number of genes in the dataset I am training on (from gene_order).
I have confirmed that the .zarr folders are intact and retain the proper zarr file structure.
training the model returns:
model = MetricLearning(n_genes=datamodule.n_genes)
trainer.fit(model, datamodule)
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params | Mode
--------------------------------------------------------
0 | encoder | Encoder | 38.4 M | train
1 | decoder | Decoder | 38.5 M | train
2 | triplet_loss_fn | TripletLoss | 0 | train
3 | mse_loss_fn | MSELoss | 0 | train
--------------------------------------------------------
76.9 M Trainable params
0 Non-trainable params
76.9 M Total params
307.739 Total estimated model params size (MB)
27 Modules in train mode
0 Modules in eval mode
ValueError: num_samples should be a positive integer value, but got num_samples=0Meaning the model training is initiated but there is no data to train on.
Can you please provide some advice on how to import .zarr files so that MetricLearningDataModule does not import an empty dictionary?
Thank you so much for your help!