You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We now merge the datasets, and ensure that we only include transcripts that are measured in all samples with counts greater than zero. Further we scale the measurements so that every gene expression value is scaled using scikit-learn's StandardScaler.
43
+
We now merge the datasets, and ensure that we only include transcripts that are measured in all samples with counts greater than zero. Subsequently we log our data and reduce our set to the 1k transcripts with highest variance. Further we scale the measurements so that every gene expression value is scaled using scikit-learn's StandardScaler.
labels = torch.tensor([1.0 for _ in lusc.columns] + [0.0 for _ in luad.columns], dtype=torch.float32)
81
92
82
93
# Use TensorDataset to create a dataset
@@ -112,7 +123,7 @@ class VAE(nn.Module):
112
123
113
124
def decode(self, z):
114
125
h3 = torch.relu(self.fc3(z))
115
-
out = torch.sigmoid(self.fc4(h3))
126
+
out = self.fc4(h3)
116
127
return out
117
128
118
129
def forward(self, x):
@@ -124,51 +135,45 @@ class VAE(nn.Module):
124
135
Next, we select a gradient-based optimizer (Adam) and the loss function to optimize (reconstruction + KLD). The train and test procedures are defined below.
125
136
126
137
```{code-cell} ipython3
127
-
input_dim = combined.shape[0]
128
138
model = VAE(input_dim, hidden_dim, latent_dim).to(device)
129
139
optimizer = optim.Adam(model.parameters(), lr=lr)
130
140
131
141
132
142
# Reconstruction + KL divergence losses summed over all elements and batch
@@ -312,20 +313,8 @@ The genes that the decoder finds most different between the set means can now be
312
313
predicted["diff"].idxmin(axis=0)
313
314
```
314
315
315
-
Which is a [cancer-related](https://www.proteinatlas.org/ENSG00000172731-LRRC20/cancer) protein.
316
-
317
-
+++
318
-
319
316
and then in the negative direction (larger in LUAD than LUSC).
320
317
321
318
```{code-cell} ipython3
322
319
predicted["diff"].idxmax(axis=0)
323
320
```
324
-
325
-
Which is a [prognostic marker](https://www.proteinatlas.org/ENSG00000146054-TRIM7/cancer) for survival in LUAD.
326
-
327
-
Here these two genes seem to be the largest differentiators between LUSC and LUAD. We can also note that, as with PCA, the gene KRT17 appears quite different between the cancer types:
0 commit comments