Skip to content

Commit 6c9ca65

Browse files
committed
edit phase walkthrough
1 parent 802f69c commit 6c9ca65

File tree

1 file changed

+12
-31
lines changed

1 file changed

+12
-31
lines changed

doc/examples/pairtools_phase_walkthrough.ipynb

Lines changed: 12 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -34,50 +34,31 @@
3434
"source": [
3535
"Several approaches have been developed to process Hi-C data from haplotype-resolved experiments. In `pairtools`, we implement the approach that was used in Erceg et al. Here is its brief outline:\n",
3636
"\n",
37-
"1. [Create the reference genome](#Create-the-reference-genome): create a \"concatenated\" reference genome that contains sequences of both homologs of each chromosome. \n",
37+
"1. Create the haplotype-resolved genome. First, we will create a \"concatenated\" reference genome that contains sequences of both homologs of each chromosome. \n",
3838
"\n",
3939
" - Incorporate known SNVs (usually in .vcf format) into the reference genome using [bcftools](https://samtools.github.io/bcftools/bcftools.html) to create FASTA files with the sequences of both homologs.\n",
4040
" - Add suffixes to the name of each homolog that identify the type (`_hap1` or `_hap2`).\n",
4141
"\n",
4242
"2. Map the Hi-C data to the concatenated reference and parse resulting alignment into Hi-C pairs. Compared to the standard Hi-C pipeline, this step would contain a couple of modifications:\n",
43-
" - parse allowing multimappers (mapq 0). \n",
44-
" - make the aligner report two suboptimal alignments (aka the second and the third hit).\n",
43+
" - Make the aligner report two suboptimal alignments (aka the second and the third hit).\n",
44+
" - Parse allowing multimappers (mapq 0). \n",
4545
" \n",
4646
" Note that, upon mapping to the homolog-resolved genome, Hi-C reads will report the identity of their homologue as the suffix of the chromosome name.\n",
4747
" \n",
48-
" See sections:\n",
49-
" \n",
50-
" (i) [Download data](#Download-data)\n",
51-
" \n",
52-
" (ii) [Map data with bwa mem to diploid genome](#Map-data-with-bwa-mem-to-diploid-genome)\n",
53-
" \n",
54-
" (iii) [pairtools parse](#pairtools-parse)\n",
55-
" \n",
48+
"3. Phase the resulting pairs based on the reported suboptimal alignments. \n",
5649
"\n",
57-
"3. [pairtools phase](#pairtools-phase): phase the pairs based on the reported suboptimal alignments. \n",
50+
" By checking the scores of two suboptimal alignments, we will distinguish the true multi-mappers from unresolved pairs (i.e. cases when the read aligns to the location with no distinguishing SNV). Phasing will remove the haplotype suffixes from chromosome names and add extra fields to the .pairs file with:\n",
5851
"\n",
59-
" By checking the scores of two suboptimal alignments, we will distinguish the true multi-mappers from unresolved pairs (i.e. cases when the read aligns to the location with no distinguishing SNV).\n",
60-
" Phasing procedure will remove the haplotype suffixes from chromosome names and add extra fields to the .pairs file with:\n",
61-
" \n",
62-
" '.' (non-resolved)\n",
63-
" \n",
64-
" '0' (first haplotype) or \n",
65-
" \n",
66-
" '1' (second haplotype). \n",
67-
" \n",
52+
" - '.' (non-resolved)\n",
53+
" - '0' (first haplotype) \n",
54+
" - '1' (second haplotype)\n",
6855
" \n",
6956
" Phasing schema: \n",
7057
" \n",
71-
"![image.png](attachment:62e74fba-c1c1-44b5-a3e2-3699c3cac7ce.png)\n",
72-
"\n",
58+
" ![image.png](attachment:62e74fba-c1c1-44b5-a3e2-3699c3cac7ce.png)\n",
7359
"\n",
74-
"4. Post-procesing. Sort and dedup Hi-C pairs and calculate stats, similarly to the standard Hi-C pipeline. \n",
7560
"\n",
76-
" See sections:\n",
77-
" \n",
78-
" (i) [pairtools dedup](#pairtools-dedup)\n",
79-
" \n",
80-
" (ii) [Stats](#Stats)"
61+
"4. Post-procesing. Sort and [dedup](#pairtools-dedup) Hi-C pairs and calculate [stats](#Stats), similarly to the standard Hi-C pipeline. "
8162
]
8263
},
8364
{
@@ -361,15 +342,15 @@
361342
"id": "dfd7c4cb-31dd-43df-8510-95fd0ff9f78f",
362343
"metadata": {},
363344
"source": [
364-
"#### Create the index of concatenated haplotypes"
345+
"#### Create the bwa index of homolog-resolved genome"
365346
]
366347
},
367348
{
368349
"cell_type": "markdown",
369350
"id": "99d28f6f-b754-4a95-95d5-9e5e51d14571",
370351
"metadata": {},
371352
"source": [
372-
"Concatenate the genomes and index them together. "
353+
"Concatenate the genomes of two homologs and index them together. "
373354
]
374355
},
375356
{

0 commit comments

Comments
 (0)