-
Notifications
You must be signed in to change notification settings - Fork 148
Description
Hi cellxgene team,
I downloaded several single-cell RNA-Seq datasets to play around (trying to develop an interactive tool).
Here's the download link in case it's relevant:
wget https://datasets.cellxgene.cziscience.com/14f66456-7e6f-4f81-8a5e-1bce2e33c78e.h5ad -O frontal_cortex_c9als.h5ad
I found the count matrices were not original integer, but transformed, in ways I didn't fully understand. For some purposes I would like to reconstruct the original integer-based count matrices - please help with that if you can.
Typically I do
adata = anndata.read_h5ad("path/to/file.h5ad")
unique_vals = np.unique(adata.X.data) # sorted nonzero values
then I get at the lower end:
unique_vals[0:100]
array([0.16644052, 0.18068525, 0.18491335, 0.1860282 , 0.1920335 ,
0.19255719, 0.1983962 , 0.20134304, 0.20789851, 0.20895565,
0.2116718 , 0.21252038, 0.22149941, 0.22176214, 0.22515076,
0.22701937, 0.2311127 , 0.23293549, 0.23637326, 0.2369829 ,...
Assuming ln(1+x) is somehow involved, I tried to fit to:
y = f1 * np.log(c + (f2 * x))
where x represents original integer counts [1,2,3,...]
This could work, but it would be great to get some clarifying information.
Thanks and regards,
Paul Meraner
paul.meraner@gmail.com
www.omicsq.com