-
Notifications
You must be signed in to change notification settings - Fork 36
Add coverage #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add coverage #273
Conversation
agalitsyna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments after a brief checkout, mostly my misunderstanding plus generally mysterious parts of the code highlighted
pairtools/cli/coverage.py
Outdated
| type=int, | ||
| default=0, | ||
| show_default=True, | ||
| help="Shift value for strand-specific adjustment", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description feels a bit incomplete. Maybe add a note on what is the zero point relative to which the shift wil be performed here? Also, what will be the direction of the shift?
| type=click.Choice(["5", "3", "0"]), | ||
| default="0", | ||
| show_default=True, | ||
| help="5', 3' end, or 0 (default) - whatever is reported as pos1 and pos2 in pairs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What is default "0" here?
- How does this parameter interact with the "side" parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Default is whatever is reported in pos1 or pos2 (so depends on how the pairs where generated).
- Well... It calculates coverage of 5', 3' or default fragment ends in the specified side (i.e. --side 1 --end 5 will use pos51, --side 2 --end 3 will use pos32)
| type=str, | ||
| required=False, | ||
| ) | ||
| @click.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having two separate parameters "side" and "end", maybe it's better to allow the user to submit the list of columns to use for the coverage calculation? It's just a suggestion, might be not the smartest one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. How would you specify it in case you want to have the coverage by midpoint or whole length, as you suggest below and I also wanted to implement anyway?
| default="1", | ||
| show_default=True, | ||
| help="0: both sides, 1: first side, 2: second side", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to allow using (1) the midpoints of the reads for calculation and (2) whole length of the read (each nucleotide of the read adds +1 to the coverage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will add it! At least the midpoint I was planning anyway for sure.
| "first column lists scaffold names. If not provided, will be extracted " | ||
| "from the header of the input pairs file.", | ||
| ) | ||
| @click.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to disable the output bgzip-/lz4c-file and have bigwig only? If bgzip-/lz4c- output is required, it shall have an enormous burden on the IO if you need bigwig only...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in pairtools we always have a text output, no? How bioframe.to_bigwig works is it saves to text and then converts anyway, so the burden is only double... But I guess we can at least default to saving to stdout and then if you don't want it you dump it to /dev/null?
pairtools/lib/coverage.py
Outdated
| return coverage_df | ||
|
|
||
|
|
||
| def save_coverage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty function?
Add simple calculation of coverage from pairs. Allows to use read1, read2 or both. Can shift each position a certain distance based on strand (e.g. useful for nucleosome dyads from micro-C). If file contains pos51/pos31/... can use that instead of pos1/2. Can do base pair resolution, or some binning.
Still needs tests.
Could be nice to be able to stream output and not generate a giant dataframe for the whole genome, but actually it's only possible for pos1 (since it comes first in sorting). Also atm multiprocessing doesn't seem to do much or anything, perhaps can be optimized somehow.