Releases · jsh58/Genrich

17 Mar 12:13

jsh58

v0.6.1

8558e92

Version 0.6.1 Latest

Latest

Two bugs fixed:

analyzing singleton alignments with no alignment scores (#30)
removing PCR duplicates when one of multiple samples has no unpaired alignments (#72)

Assets 2

04 Aug 14:45

jsh58

v0.6

d896ab1

Version 0.6

New default peak-calling parameters: -p 0.01 -a 200.0. This also means that, by default, q-values are not calculated.
In ATAC-seq mode, the ends of fragments are adjusted by +5/-5 to account for Tn5 transposase occupancy, when determining the centers of cut site intervals. This adjustment can be avoided with -D.
The number of warning messages of reads extending off the ends of chromosomes has been limited to 128 (per input file).
A convenience option (-S) was added, which prevents Genrich from checking the sort order of the input SAM/BAM file. The input file is still parsed under the assumption that it is sorted by queryname, so this option should be used judiciously.

Assets 2

14 Jan 15:55

jsh58

v0.5

7e4ef84

Version 0.5

The null statistical model was changed to the log-normal distribution. This was determined to be a more appropriate choice than other distributions (including the previously used exponential) from examining ChIP-seq control samples.
The minimum length (-l) is now a regular peak-calling parameter, not an alternative peak-calling method. The default value is 0bp. Those who prefer peak-calling by minimum length and not AUC can run -a 0 -l <int>.
Two convenience options were added:
- -X: option to skip peak-calling, but perform all alignment parsing (including identifying PCR duplicates)
- -P: option to call peaks directly from a bedgraph-ish log file (-f) produced by a previous Genrich run

Assets 2

17 Dec 15:01

jsh58

v0.4

14cfb31

Version 0.4

Option to remove PCR duplicates (-r). The procedure is explained here. Incorporates ideas from findDups9.py, in particular the process of evaluating multimapping reads/fragments.
- A log file listing duplicates can be produced via -R <file>.
A reasonably complete README was added to the homepage.
Input SAM/BAM files are now required to be queryname-sorted (samtools sort -n).
A bug in alignment parsing (paired secondary alignments with matching positions) was fixed.

Assets 2

04 Oct 14:23

jsh58

v0.3

f1a39ce

Version 0.3

New default peak-calling method: area under the curve (AUC). For a peak to be called, the total significance of the region must exceed a minimum value (-a <float>, default 20.0).
- The total significance is calculated as the sum of the -log(q) values above the -q threshold over the length of the region (i.e. the area under the -log(q) "curve"). If a -p threshold is specified, the area under the -log(p) curve is calculated.
- The maximum gap parameter (-g <int>) still allows multiple regions to be linked.
- No minimum length is required for a peak to be called.
- Can be overridden by specifying -l <int>, in which case peak-calling reverts to the previous method, with the given minimum length for peaks.
Option to provide a BED file of genomic regions to exclude from analysis (-E <file>).
- The regions will affect peak calls, such that no peak may extend into or around an excluded region.
- The regions' lengths will be subtracted from the genome length calculated by the program.
- In the output log files, excluded regions will have treatment/control pileup values of 0.0 and p-/q-values of NA.
- Multiple BED files can be specified, comma-separated (or space-separated, in quotes).
Accessory script findNs.py to produce a BED file of 'N' homopolymers from a fasta file (e.g. a reference genome). The output can (and should) be given to Genrich via -E (above).
Option to keep unpaired alignments, with lengths changed to a given value, has been changed to -w <int> (formerly -a <int>).

Assets 2

24 Sep 13:56

jsh58

v0.2

e19b55f

Version 0.2

Multiple replicates are now analyzed separately, with p-values calculated for each. At each position, the multiple p-values are then combined by Fisher's method, before conversion to q-values.
Three optional output log files:
- -f <file>: with one replicate, it lists treatment/control pileups, p- and q-values, and significance for each interval; with multiple replicates, it lists p-values of each replicate, combined p-value, q-value, and significance for each interval.
- -k <file>: for each replicate, sequentially, it lists a header line (# treatment file: <name>; control file: <name>), followed by treatment/control pileups and a p-value for each interval. This is the way to examine pileup values with multiple replicates, since the -f file will not supply them.
- -b <file>: an unsorted BED file of the reads/fragments analyzed. The 4th column gives the read name, number of alignments, 'T'reatment or 'C'ontrol, and sample number (0-based), e.g. SRR5427885.57_2_T_0.
Option to analyze reads in ATAC-seq mode (-j). Instead of analyzing full fragments, the program uses "cut site intervals" centered on the ends of fragments. The interval lengths are determined by the -d parameter (def. 100bp).
Control sample pileups are now scaled to match the treatment, based on fragment/interval lengths.
Fixed bug related to sorting alignments by scores.
Fixed bugs related to reference sequences (chromosomes) missing from a SAM/BAM file. Reads aligning to chromosomes either on the "skipped" list (-e argument) or not appearing in the header of the treatment SAM/BAM file will not be analyzed.

Assets 2

31 Aug 19:54

jsh58

v0.1

61f502b

Version 0.1

Analyzes secondary alignments:
- Keeps alignments whose alignment scores (AS) are within a specified value (-s <float>) of the best alignment. The default value of -s 0 means that only alignments judged as equivalent by the aligner will be kept.
- Each of the n alignments for a read/fragment is counted as 1/n for the pileup.
- To avoid excessive memory usage and the imprecision inherent in floating-point values, a maximum of 10 alignments per read is analyzed. Reads with more than 10 alignments will be subsampled based on the best alignment scores; in the case of ties, alignments appearing first in the SAM/BAM are favored.
Input SAM/BAM files are required to be name sorted (samtools sort -n)
A simple hashtable is implemented for efficiently compiling p-values to calculate q-values. The q-value calculation is still based on the Benjamini-Hochberg procedure.

Assets 2

13 Aug 14:08

jsh58

v0.0

e3c71c3

Version 0.0

Peak-calling for genomic enrichment assays of treatment sample(s) (-t [<file>]+), with or without control sample(s) (-c [<file>]+). Input files must be in SAM/BAM format. Incorporates ideas from SAMtoBED, removeChrom, and MACS2 v2.1.2_dev.

Control of analysis of alignments:
- Properly paired alignments only (default); fragments are inferred appropriately
- Also keeping unpaired alignments:
  - As they appear in the SAM/BAM file (-y)
  - Length increased to specified value (-a <int>)
  - Length increased to average value of paired alignments (-x)
Filtering options:
- Chromosomes (reference sequences) to ignore (-e <arg>)
- Minimum mapping quality (-m <int>)
Peak-calling options:
- Maximum q-value (-q <float>, def. 0.05)
- Maximum p-value (-p <float>)
- Minimum length of a peak (-l <int>, def. 100bp)
- Maximum distance between significant sites (-g <int>, def. 100bp)
Output options:
- Output peak file, in ENCODE narrowPeak format (-o <file)
- Output bedgraph file listing treatment/control pileups and p/q values (-b <file>)

Assets 2

Releases: jsh58/Genrich

Version 0.6.1

Uh oh!

Version 0.6

Uh oh!

Version 0.5

Uh oh!

Version 0.4

Uh oh!

Version 0.3

Uh oh!

Version 0.2

Uh oh!

Version 0.1

Uh oh!

Version 0.0

Uh oh!