Skip to content

Conversation

@dcdehaas
Copy link

@dcdehaas dcdehaas commented Sep 23, 2024

There were some conflicting argument checks that made it challenging (impossible?) to use recombination map support. Also there was an early return in singer_master that prevented the recombination map code from being reachable.

This makes it so that you have to pass both -recomb_map and -mut_map togther, or neither of them in which case -m is required. This seemed to be the expected existing behavior. Updated parallel_singer to also support rate map arguments.

There is still a bug that I have not fixed, which is the RateMap read will silently fail if you pass in a incorrectly formatted rate map file.

singer_master already supports it.
@dcdehaas dcdehaas force-pushed the dcd_recmap_parallel branch from f50ba81 to 31a0f2a Compare October 24, 2024 16:29
@dcdehaas dcdehaas changed the title Add -recomb_map option for parallel_singer Generalize recombination map support to all scripts Oct 24, 2024
arg = ARG(Ne, sequence_length);
arg.discretize(bin_size);
arg.build_singleton_arg(n);
if (mut_rate > 0 and recomb_rate > 0) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was always true, because of lines 20 and 21 above.

SINGER takes -start which allows you to run on a segment of the
genome other than starting at basepair=0. Previously the Rate_map
ignored this, meaning it was emitting recombination/mutation rates
for the wrong region. It could also end up producing queries for
coordinates that were prior to the start of the rate map, due to
mutations being offset incorrectly.
double p = (actualCoordinate - coordinates[index])/(coordinates[index+1] - coordinates[index]);
double dist = (1-p)*prev_dist + p*next_dist;
return dist;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely Rate_map::mean_rate() also needs to be adjusted based on the region between start_pos and end_pos instead of taking the mean of the entire map. hopefully it is not a big deal, given that parallel_singer uses regions of 1Mbp that hopefully have means close to the full mean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix makes sense (to me) because the alternative is that user has to adjust the recombination map many times, which could be cumbersome.

Copy link

@AprilWei001 AprilWei001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix makes sense (to me) because the alternative is that a user has to adjust the recombination map many times, which could be cumbersome.

double p = (actualCoordinate - coordinates[index])/(coordinates[index+1] - coordinates[index]);
double dist = (1-p)*prev_dist + p*next_dist;
return dist;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix makes sense (to me) because the alternative is that user has to adjust the recombination map many times, which could be cumbersome.

Having recombination rate of 0 causes NaNs in downstream
probabilities, and forbidding SINGER from allowing recombination
is likely difficult to do. Instead, just impose a very small
recombination rate in any region that has too low of a rate.
Copy link

@AprilWei001 AprilWei001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me that this needs to be fixed.
A design choice for Yun to decide is perhaps whether it's best to fix it here or at the place where recombination events are being sampled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants