You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/protocols_pipelines.rst
+16-10Lines changed: 16 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,22 +58,25 @@ To adapt the standard workflow for common variations of the Hi-C protocol, consi
58
58
59
59
1. ``pairtools parse --walks-policy``:
60
60
61
-
This parameter determines how pairtools parse handles reads with multiple alignments (walks). We recommend specifying the value explicitly, as the default has changed between versions of ``pairtools parse``.
61
+
This parameter determines how pairtools parse handles reads with multiple alignments (walks). We recommend specifying the value explicitly, as the default has changed between versions of ``pairtools parse``.
62
62
63
-
Our current recommendation is to use ``--walks-policy 5unique``, which is the default setting in the latest version of pairtools. With this option, pairtools parse reports the two 5'-most unique alignments on each side of a paired read as a pair.
63
+
Our current recommendation is to use ``--walks-policy 5unique``, which is the default setting in the latest version of pairtools. With this option, pairtools parse reports the two 5'-most unique alignments on each side of a paired read as a pair.
64
64
65
-
This option increases the number of reported pairs compared to the most conservative ``--walks-policy mask``. However, it's important to note that ``5unique`` can potentially report pairs of non-directly ligated fragments (i.e., two fragments separated by one or more other DNA fragments). Such non-direct (also known as "higher-order" or "nonadjacent") ligations have slightly different statistical properties than direct ligations, as illustrated in several Pore-C papers [`1 <https://www.biorxiv.org/content/10.1101/833590v1.full>`_ , `2 <https://www.nature.com/articles/s41467-023-36899-x>`_].
65
+
This option increases the number of reported pairs compared to the most conservative ``--walks-policy mask``. However, it's important to note that ``5unique`` can potentially report pairs of non-directly ligated fragments (i.e., two fragments separated by one or more other DNA fragments). Such non-direct (also known as "higher-order" or "nonadjacent") ligations have slightly different statistical properties than direct ligations, as illustrated in several Pore-C papers [`1 <https://www.biorxiv.org/content/10.1101/833590v1.full>`_ , `2 <https://www.nature.com/articles/s41467-023-36899-x>`_].
66
66
67
-
An alternative is the ``--walks-policy 3unique`` policy, which reports the two 3'-most unique alignments on each side of
68
-
a paired read as a pair, thus decreasing the chance of reporting non-direct ligations.
69
-
However, ``3unique`` may not work well in situations where the combined length of a read pair is longer than the length of a DNA fragment (e.g. long read experiments).
70
-
In this case, the 3' sides of the two reads will cover the same locations in the DNA molecule, and the 3' alignments may end up identical.
67
+
An alternative is the ``--walks-policy 3unique`` policy, which reports the two 3'-most unique alignments on each side of
68
+
a paired read as a pair, thus decreasing the chance of reporting non-direct ligations.
69
+
However, ``3unique`` may not work well in situations where the combined length of a read pair is longer than the length of a DNA fragment (e.g. long read experiments).
70
+
In this case, the 3' sides of the two reads will cover the same locations in the DNA molecule, and the 3' alignments may end up identical.
71
71
72
-
Finally, the experimental ``--walks-policy all`` option reports all alignments of a read pair as separate pairs. This option maximizes the number of reported pairs. The downside is that it breaks the assumption that there is only one pair per read, which is not compatible with retrieval of .sam records from .pairsam output and may also complicate the interpretation of pair statistics.
72
+
Finally, the experimental ``--walks-policy all`` option reports all alignments of a read pair as separate pairs.
73
+
This option maximizes the number of reported pairs.
74
+
The downside is that it breaks the assumption that there is only one pair per read,
75
+
which is not compatible with retrieval of .sam records from .pairsam output and may also complicate the interpretation of pair statistics.
73
76
74
77
2. ``pairtools select "(mapq1>=30) and (mapq2>=30)"``:
75
78
76
-
This filtering command selects only pairs with high-quality alignments,
79
+
This filtering command selects only pairs with high-quality alignments,
77
80
where both reads in a pair have a mapping quality (MAPQ) score of 30 or higher.
78
81
Applying this filter helps remove false alignments between partially homologous sequences, which often cause artificial high-frequency interactions in Hi-C maps.
79
82
This step is essential for generating maps for high-quality dot calls.
@@ -114,5 +117,8 @@ Technical tips
114
117
to input decompression and output compression. Additionally, `pairtools sort` parallelizes sorting with `--nproc`.ß
115
118
116
119
Example Workflows
117
-
-----------------
120
+
------------------
121
+
For more advanced workflows, please check the following projects:
118
122
123
+
- `Distiller-nf <https://github.com/open2c/distiller-nf>`_ is a feature-rich Open2C Hi-C processing pipeline for the Nextflow workflow manager.
124
+
- `Distiller-sm <https://github.com/open2c/distiller-sm>`_ is a similarly feature-rich and optimized pipeline implemented in Snakemake.
0 commit comments