We currently assume that the number of samples used to estimate mutation probabilities is equal to the number of samples upon which predictions are made, e.g.,
training:
|
'number_tumors': args.number_tumors, |
prediction:
This is unnecessary: a different number of samples may be used to estimate (train) than is used when computing expectations. The code should therefore be generalized to have two independent variables, representing the number of training samples, and the number of prediction samples. The values of these variables may be inferred from data files, or furnished by the user.
cc: @jkunisak