Skip to content

Conversation

@Phlya
Copy link
Member

@Phlya Phlya commented Jul 15, 2025

Add simple calculation of coverage from pairs. Allows to use read1, read2 or both. Can shift each position a certain distance based on strand (e.g. useful for nucleosome dyads from micro-C). If file contains pos51/pos31/... can use that instead of pos1/2. Can do base pair resolution, or some binning.

Still needs tests.

Could be nice to be able to stream output and not generate a giant dataframe for the whole genome, but actually it's only possible for pos1 (since it comes first in sorting). Also atm multiprocessing doesn't seem to do much or anything, perhaps can be optimized somehow.

Copy link
Member

@agalitsyna agalitsyna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments after a brief checkout, mostly my misunderstanding plus generally mysterious parts of the code highlighted

type=int,
default=0,
show_default=True,
help="Shift value for strand-specific adjustment",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description feels a bit incomplete. Maybe add a note on what is the zero point relative to which the shift wil be performed here? Also, what will be the direction of the shift?

type=click.Choice(["5", "3", "0"]),
default="0",
show_default=True,
help="5', 3' end, or 0 (default) - whatever is reported as pos1 and pos2 in pairs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is default "0" here?
  2. How does this parameter interact with the "side" parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Default is whatever is reported in pos1 or pos2 (so depends on how the pairs where generated).
  2. Well... It calculates coverage of 5', 3' or default fragment ends in the specified side (i.e. --side 1 --end 5 will use pos51, --side 2 --end 3 will use pos32)

type=str,
required=False,
)
@click.option(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having two separate parameters "side" and "end", maybe it's better to allow the user to submit the list of columns to use for the coverage calculation? It's just a suggestion, might be not the smartest one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. How would you specify it in case you want to have the coverage by midpoint or whole length, as you suggest below and I also wanted to implement anyway?

default="1",
show_default=True,
help="0: both sides, 1: first side, 2: second side",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to allow using (1) the midpoints of the reads for calculation and (2) whole length of the read (each nucleotide of the read adds +1 to the coverage)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will add it! At least the midpoint I was planning anyway for sure.

"first column lists scaffold names. If not provided, will be extracted "
"from the header of the input pairs file.",
)
@click.option(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to disable the output bgzip-/lz4c-file and have bigwig only? If bgzip-/lz4c- output is required, it shall have an enormous burden on the IO if you need bigwig only...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in pairtools we always have a text output, no? How bioframe.to_bigwig works is it saves to text and then converts anyway, so the burden is only double... But I guess we can at least default to saving to stdout and then if you don't want it you dump it to /dev/null?

return coverage_df


def save_coverage(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants