You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+57-58Lines changed: 57 additions & 58 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,83 +4,82 @@ A flexible framework for nucleosome profiling of cell-free DNA
4
4
5
5
## Description
6
6
To run Griffin, use the snakemakes in the the 'snakemakes' directory
7
-
See the Wiki for further instructions
7
+
See the Griffin wiki (https://github.com/adoebley/Griffin/wiki) for further instructions and a demo.
8
+
9
+
The methodology is described in:
10
+
Doebley, et al. Griffin: Framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA. (2021) MedRxiv. [doi: https://doi.org/10.1101/2021.08.31.21262867](https://doi.org/10.1101/2021.08.31.21262867)
11
+
12
+
The analysis workflow consists of 4 tasks:
8
13
9
14
1. griffin_genome_GC_frequncy
10
-
Calculate the frequency of fragments with each GC content across the mappable regions of the reference genome
11
-
For hg38, this step is already complete and results are in Ref/genome_GC_frequency
12
-
Griffin has not been tested on genome builds other than hg38, but this snakemake is provided in case you would like to try a different genome build or different filter for mappable regions
15
+
-Calculate the frequency of fragments with each GC content across the mappable regions of the reference genome
16
+
-For hg38, this step is already complete and results are in Ref/genome_GC_frequency
17
+
-Griffin has not been tested on genome builds other than hg38, but this snakemake is provided in case you would like to try a different genome build or different filter for mappable regions (ex. shorter or longer reads)
13
18
14
19
2. griffin_GC_correction
15
-
Calculate the GC bias for a given set of bam files
16
-
To run this step:
17
-
create a samples.yaml with your list of bam files and place it in config (see config/example_samples.yaml for format)
18
-
edit config.yaml to provide the path to the reference genome (hg38)
19
-
follow the directions at the top of griffin_GC_correction.snakemake to run the snakemake
20
+
-Calculate the GC bias for a given set of bam files
21
+
-To run this step:
22
+
1. Create a samples.yaml with your list of bam files and place it in config (see config/example_samples.yaml for format)
23
+
2. Edit config.yaml to provide the path to the reference genome (hg38)
24
+
3. Follow the directions at the top of griffin_GC_correction.snakemake to run the snakemake
- Intermediate file with the number of fragments with each length and GC content
31
+
3. repeat_masker.mapable.k50.Umap.hg38/GC_plots/
32
+
- Assorted plots of the GC bias for each sample
33
+
4. samples.GC.yaml
34
+
- A config file for use in the nucleosome profiling step
35
+
- Copy this file into griffin_nucleosome_profiling/config/ to run the nucleosome profiling analysis on these samples
30
36
31
37
3. griffin_filter_sites
32
-
If using a new set of sites (not previously filtered) you will need to filter them to remove low mappability sites.
33
-
If you have your own strategy for removing low mappability sites, you can skip this step but will need to add a column with the header 'position' to your sites file for subsequent steps.
34
-
To run this step:
35
-
Create a sites.yaml with paths to your lists of sites and place it in config (see config/example_sites.yaml for format)
36
-
Site lists must be tab separated with a header at the top. At a minimum they must contain columns with the chromosome and position
37
-
Edit config.yaml to specify the location of your mappability track (k50.Umap.MultiTrackMappability.hg38.bw can be downloaded from: https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k50.Umap.MultiTrackMappability.bw)
38
-
Edit config.yaml to specify the name of the column with the chromosome and position or beginning and end of an interval containing the site.
39
-
If 'position' is a column in your input:
40
-
chrom_column: Chrom
41
-
start_column: position
42
-
end_column: position
43
-
If you only have an interval start and end:
44
-
chrom_column: Chrom
45
-
start_column: Start
46
-
end_column: End
47
-
Follow the directions at the top of griffin_filter_sites.snakefile to run the snakemake
38
+
- If using a new set of sites (not previously filtered) you will need to filter them to remove low mappability sites.
39
+
- If you have your own strategy for removing low mappability sites, you can skip this step but will need to add a column with the header 'position' to your site lists for subsequent steps.
40
+
- To run this step:
41
+
1. Create a sites.yaml with paths to your site lists and place it in config (see config/example_sites.yaml for format)
42
+
2. Site lists must be tab separated with a header at the top. At a minimum they must contain a columns with the chromosome and a column with the position
43
+
3. Edit config.yaml to specify the location of your mappability track (k50.Umap.MultiTrackMappability.hg38.bw can be downloaded from: https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k50.Umap.MultiTrackMappability.bw)
44
+
4. Edit config.yaml to specify the name of the columns with the chromosome and position or beginning and end of an interval containing the site.
45
+
5. Follow the directions at the top of griffin_filter_sites.snakefile to run the snakemake
48
46
49
-
Outputs:
50
-
sites/<site_list_name>.counts.txt
51
-
Summary of the number of low and high mappability sites
52
-
sites/<site_list_name>.high_mapability.txt
53
-
high mappability sites to be used in subsequent steps
54
-
sites/<site_list_name>.low_mapability.txt
55
-
low mappability sites
47
+
-Outputs:
48
+
1.sites/<site_list_name>.counts.txt
49
+
- Summary of the number of low and high mappability sites
50
+
2.sites/<site_list_name>.high_mapability.txt
51
+
-high mappability sites to be used in subsequent steps
52
+
3.sites/<site_list_name>.low_mapability.txt
53
+
-low mappability sites
56
54
57
55
4. griffin_nucleosome_profiling
58
-
Run nucleosome profiling for a given set of site lists and a given set of bam files
59
-
To run this step:
60
-
Copy the samples.GC.yaml from the griffin_GC_correction step into the config directory
61
-
Make a sites.yaml containing paths to the high mappability output files from griffin_filter_sites (see config/example_sites.yaml for format)
62
-
Edit config.yaml to provide the path to the reference genome (hg38)
63
-
Edit other config settings as needed
64
-
Follow the directions at the top of griffin_nucleosome_profiling.snakefile to run the snakemake
56
+
- Run nucleosome profiling for a given set of site lists and a given set of bam files
57
+
- To run this step:
58
+
1.Copy the samples.GC.yaml from the griffin_GC_correction step into the config directory
59
+
2.Make a sites.yaml containing paths to the high mappability output files from griffin_filter_sites (see config/example_sites.yaml for format)
60
+
3.Edit config.yaml to provide the path to the reference genome (hg38)
61
+
4.Edit other config settings as needed
62
+
5.Follow the directions at the top of griffin_nucleosome_profiling.snakefile to run the snakemake
nucleosome profiles and metadata for each site list.
69
-
Both GC corrected and non-GC corrected profiles are in this file and must be separated for downstream analysis (GC_correction column). Coverage profile data is labeled with the start coordinate of the bin. For instance, the column labeled -15 contains the coverage information for -15bp to 0bp relative to the site location.
These folders contain intermediate files with the coverage profiles for individual site lists. These have been concatenated into results/coverage/all_site/<sample_name>.all_sites.coverage.txt
- nucleosome profiles and metadata for each site list.
67
+
- Both GC corrected and non-GC corrected profiles are in this file and must be separated for downstream analysis (GC_correction column). Coverage profile data is labeled with the start coordinate of the bin. For instance, the column labeled -15 contains the coverage information for -15bp to 0bp relative to the site location.
-These folders contain intermediate files with the coverage profiles for individual site lists. These have been concatenated into results/coverage/all_site/<sample_name>.all_sites.coverage.txt.
72
70
73
71
## Versions of packages used for testing
74
-
argparse 1.1
72
+
argparse 1.1
75
73
pysam 0.15.4
76
74
pyBigWig 0.3.17
77
75
pandas 1.2.4
78
76
numpy 1.21.2
79
77
scipy 1.7.1
80
78
pyyaml 5.3.1
81
-
matplotlib 3.4.1
82
-
snakemake 5.5.4
83
-
python 3.7.4
79
+
matplotlib 3.4.1
80
+
snakemake 5.5.4
81
+
python 3.7.4
82
+
84
83
85
84
## Software License
86
85
Griffin Copyright (c) 2021 Fred Hutchinson Cancer Research Center
0 commit comments