generalize to two independent variables representing: number of training samples, and number of prediction samples

We currently assume that the number of samples used to estimate mutation probabilities is equal to the number of samples upon which predictions are made, e.g., 

training: 
https://github.com/quinlan-lab/constraint-tools/blob/73ce304c427e8cf5a8439bcd04cb8e592b8493b5/train-model/estimate_mutation_probabilities#L185

prediction: 
https://github.com/quinlan-lab/constraint-tools/blob/a4b6022df7b4e77cab145d28aabbcf5c56c453a6/predict-constraint/compute_mutation_counts.py#L40

This is unnecessary: a different number of samples may be used to estimate (train) than is used when computing expectations. The code should therefore be generalized to have two independent variables, representing the number of training samples, and the number of prediction samples. The values of these variables may be inferred from data files, or furnished by the user. 

cc: @jkunisak 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generalize to two independent variables representing: number of training samples, and number of prediction samples #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

generalize to two independent variables representing: number of training samples, and number of prediction samples #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions