Skip to content

Commit 8394218

Browse files
committed
Merge branch 'main' of github.com:adoebley/Griffin into main
2 parents 50673a6 + 0eb9c87 commit 8394218

File tree

1 file changed

+57
-58
lines changed

1 file changed

+57
-58
lines changed

README.md

Lines changed: 57 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -4,83 +4,82 @@ A flexible framework for nucleosome profiling of cell-free DNA
44

55
## Description
66
To run Griffin, use the snakemakes in the the 'snakemakes' directory
7-
See the Wiki for further instructions
7+
See the Griffin wiki (https://github.com/adoebley/Griffin/wiki) for further instructions and a demo.
8+
9+
The methodology is described in:
10+
Doebley, et al. Griffin: Framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA. (2021) MedRxiv. [doi: https://doi.org/10.1101/2021.08.31.21262867](https://doi.org/10.1101/2021.08.31.21262867)
11+
12+
The analysis workflow consists of 4 tasks:
813

914
1. griffin_genome_GC_frequncy
10-
Calculate the frequency of fragments with each GC content across the mappable regions of the reference genome
11-
For hg38, this step is already complete and results are in Ref/genome_GC_frequency
12-
Griffin has not been tested on genome builds other than hg38, but this snakemake is provided in case you would like to try a different genome build or different filter for mappable regions
15+
- Calculate the frequency of fragments with each GC content across the mappable regions of the reference genome
16+
- For hg38, this step is already complete and results are in Ref/genome_GC_frequency
17+
- Griffin has not been tested on genome builds other than hg38, but this snakemake is provided in case you would like to try a different genome build or different filter for mappable regions (ex. shorter or longer reads)
1318

1419
2. griffin_GC_correction
15-
Calculate the GC bias for a given set of bam files
16-
To run this step:
17-
create a samples.yaml with your list of bam files and place it in config (see config/example_samples.yaml for format)
18-
edit config.yaml to provide the path to the reference genome (hg38)
19-
follow the directions at the top of griffin_GC_correction.snakemake to run the snakemake
20+
- Calculate the GC bias for a given set of bam files
21+
- To run this step:
22+
1. Create a samples.yaml with your list of bam files and place it in config (see config/example_samples.yaml for format)
23+
2. Edit config.yaml to provide the path to the reference genome (hg38)
24+
3. Follow the directions at the top of griffin_GC_correction.snakemake to run the snakemake
2025

21-
Outputs:
22-
repeat_masker.mapable.k50.Umap.hg38/GC_bias/<sample_name>.GC_bias.txt
23-
The GC bias of fragments with each length and GC content, this is used for GC correction
24-
repeat_masker.mapable.k50.Umap.hg38/GC_counts/<sample_name>.GC_counts.txt
25-
Intermediate file with the number of fragments with each length and GC content
26-
repeat_masker.mapable.k50.Umap.hg38/GC_plots/
27-
Assorted plots of the GC bias for each sample
28-
samples.GC.yaml
29-
A config file for use in the nucleosome profiling step
26+
- Outputs:
27+
1. repeat_masker.mapable.k50.Umap.hg38/GC_bias/<sample_name>.GC_bias.txt
28+
- The GC bias of fragments with each length and GC content, this is used for GC correction
29+
2. repeat_masker.mapable.k50.Umap.hg38/GC_counts/<sample_name>.GC_counts.txt
30+
- Intermediate file with the number of fragments with each length and GC content
31+
3. repeat_masker.mapable.k50.Umap.hg38/GC_plots/
32+
- Assorted plots of the GC bias for each sample
33+
4. samples.GC.yaml
34+
- A config file for use in the nucleosome profiling step
35+
- Copy this file into griffin_nucleosome_profiling/config/ to run the nucleosome profiling analysis on these samples
3036

3137
3. griffin_filter_sites
32-
If using a new set of sites (not previously filtered) you will need to filter them to remove low mappability sites.
33-
If you have your own strategy for removing low mappability sites, you can skip this step but will need to add a column with the header 'position' to your sites file for subsequent steps.
34-
To run this step:
35-
Create a sites.yaml with paths to your lists of sites and place it in config (see config/example_sites.yaml for format)
36-
Site lists must be tab separated with a header at the top. At a minimum they must contain columns with the chromosome and position
37-
Edit config.yaml to specify the location of your mappability track (k50.Umap.MultiTrackMappability.hg38.bw can be downloaded from: https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k50.Umap.MultiTrackMappability.bw)
38-
Edit config.yaml to specify the name of the column with the chromosome and position or beginning and end of an interval containing the site.
39-
If 'position' is a column in your input:
40-
chrom_column: Chrom
41-
start_column: position
42-
end_column: position
43-
If you only have an interval start and end:
44-
chrom_column: Chrom
45-
start_column: Start
46-
end_column: End
47-
Follow the directions at the top of griffin_filter_sites.snakefile to run the snakemake
38+
- If using a new set of sites (not previously filtered) you will need to filter them to remove low mappability sites.
39+
- If you have your own strategy for removing low mappability sites, you can skip this step but will need to add a column with the header 'position' to your site lists for subsequent steps.
40+
- To run this step:
41+
1. Create a sites.yaml with paths to your site lists and place it in config (see config/example_sites.yaml for format)
42+
2. Site lists must be tab separated with a header at the top. At a minimum they must contain a columns with the chromosome and a column with the position
43+
3. Edit config.yaml to specify the location of your mappability track (k50.Umap.MultiTrackMappability.hg38.bw can be downloaded from: https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k50.Umap.MultiTrackMappability.bw)
44+
4. Edit config.yaml to specify the name of the columns with the chromosome and position or beginning and end of an interval containing the site.
45+
5. Follow the directions at the top of griffin_filter_sites.snakefile to run the snakemake
4846

49-
Outputs:
50-
sites/<site_list_name>.counts.txt
51-
Summary of the number of low and high mappability sites
52-
sites/<site_list_name>.high_mapability.txt
53-
high mappability sites to be used in subsequent steps
54-
sites/<site_list_name>.low_mapability.txt
55-
low mappability sites
47+
- Outputs:
48+
1. sites/<site_list_name>.counts.txt
49+
- Summary of the number of low and high mappability sites
50+
2. sites/<site_list_name>.high_mapability.txt
51+
- high mappability sites to be used in subsequent steps
52+
3. sites/<site_list_name>.low_mapability.txt
53+
- low mappability sites
5654

5755
4. griffin_nucleosome_profiling
58-
Run nucleosome profiling for a given set of site lists and a given set of bam files
59-
To run this step:
60-
Copy the samples.GC.yaml from the griffin_GC_correction step into the config directory
61-
Make a sites.yaml containing paths to the high mappability output files from griffin_filter_sites (see config/example_sites.yaml for format)
62-
Edit config.yaml to provide the path to the reference genome (hg38)
63-
Edit other config settings as needed
64-
Follow the directions at the top of griffin_nucleosome_profiling.snakefile to run the snakemake
56+
- Run nucleosome profiling for a given set of site lists and a given set of bam files
57+
- To run this step:
58+
1. Copy the samples.GC.yaml from the griffin_GC_correction step into the config directory
59+
2. Make a sites.yaml containing paths to the high mappability output files from griffin_filter_sites (see config/example_sites.yaml for format)
60+
3. Edit config.yaml to provide the path to the reference genome (hg38)
61+
4. Edit other config settings as needed
62+
5. Follow the directions at the top of griffin_nucleosome_profiling.snakefile to run the snakemake
6563

66-
Outputs:
67-
results/coverage/all_site/<sample_name>.all_sites.coverage.txt
68-
nucleosome profiles and metadata for each site list.
69-
Both GC corrected and non-GC corrected profiles are in this file and must be separated for downstream analysis (GC_correction column). Coverage profile data is labeled with the start coordinate of the bin. For instance, the column labeled -15 contains the coverage information for -15bp to 0bp relative to the site location.
70-
results/coverage/<site_name>/<sample_name>.<site_name>.coverage.txt
71-
These folders contain intermediate files with the coverage profiles for individual site lists. These have been concatenated into results/coverage/all_site/<sample_name>.all_sites.coverage.txt
64+
- Outputs:
65+
1. results/coverage/all_sites/<sample_name>.all_sites.coverage.txt
66+
- nucleosome profiles and metadata for each site list.
67+
- Both GC corrected and non-GC corrected profiles are in this file and must be separated for downstream analysis (GC_correction column). Coverage profile data is labeled with the start coordinate of the bin. For instance, the column labeled -15 contains the coverage information for -15bp to 0bp relative to the site location.
68+
2. results/coverage/<site_name>/<sample_name>.<site_name>.coverage.txt
69+
- These folders contain intermediate files with the coverage profiles for individual site lists. These have been concatenated into results/coverage/all_site/<sample_name>.all_sites.coverage.txt.
7270

7371
## Versions of packages used for testing
74-
argparse 1.1
72+
argparse 1.1
7573
pysam 0.15.4
7674
pyBigWig 0.3.17
7775
pandas 1.2.4
7876
numpy 1.21.2
7977
scipy 1.7.1
8078
pyyaml 5.3.1
81-
matplotlib 3.4.1
82-
snakemake 5.5.4
83-
python 3.7.4
79+
matplotlib 3.4.1
80+
snakemake 5.5.4
81+
python 3.7.4
82+
8483

8584
## Software License
8685
Griffin Copyright (c) 2021 Fred Hutchinson Cancer Research Center

0 commit comments

Comments
 (0)