Skip to content

Commit b7a0b60

Browse files
Merge pull request #39 from YuWei-CH/Update-Readme
Update readme
2 parents eb8d274 + 0898ffe commit b7a0b60

File tree

1 file changed

+86
-25
lines changed

1 file changed

+86
-25
lines changed

README.md

Lines changed: 86 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,113 @@
11
# <i>ref</i>orm
22

3-
[*ref*orm](https://gencore.bio.nyu.edu/reform/) is a python-based command line tool that allows for fast, easy and robust editing of reference genome sequence and annotation files.
3+
[*ref*orm](https://gencore.bio.nyu.edu//) is a Python-based command-line tool for fast, robust, and flexible editing of reference genome sequence and annotation files.
44

5-
Execution of *ref*orm requires a reference sequence (fasta), reference annotation (GFF or GTF), the novel sequences to be added (fasta), and corresponding novel annotations (GFF or GTF). A user provides as arguments the name of the modified chromosome and either the position at which the novel sequence is inserted, or the upstream and downstream sequences flanking the novel sequences. This results in the addition and/or deletion of sequence from the reference in the modified fasta file. In addition to the novel annotations, any changes to the reference annotations that result from deleted or interrupted sequence are incorporated into the modified gff. Importantly, modified gff and fasta files include a record of the modifications.
5+
To perform an edit, *ref*orm requires a reference genome (FASTA), its annotation file (GFF or GTF), a novel sequence to be inserted (FASTA), and the corresponding annotation (GFF or GTF). The user specifies either:
6+
7+
- the chromosome and the position at which to insert the novel sequence, or
8+
- the chromosome along with the upstream and downstream flanking sequences.
9+
10+
The result is a modified reference genome (FASTA) and annotation file (GFF), incorporating the novel sequence and its annotations. Any reference annotations affected by the insertion or deletion are automatically updated. All modifications are documented within the output files.
11+
12+
In addition to modifying existing chromosomes, *ref*orm also supports appending entirely new chromosomes. In this mode, users provide the novel chromosome’s sequence and annotations, which are added to the reference genome and integrated into the annotation file.
613

714
Learn more at https://gencore.bio.nyu.edu/reform/
815

916
## Usage
1017

11-
*ref*orm requires Python3, pgzip and Biopython v1.78 or higher.
18+
*ref*orm requires Python3 and Biopython v1.78 or higher.
19+
20+
Install biopython if you don't already have it:
1221

13-
Install pgzip and biopython if you don't already have it:
22+
`pip install biopython>=1.78`
1423

15-
`pip install pgzip biopython>=1.78`
24+
*ref*orm supports reading and writing .gz files using gzip. To accelerate compression and decompression, it optionally supports pgzip, a parallel implementation of gzip. Users must install pgzip separately to enable this feature.
25+
26+
*Optional:* Install pgzip if you don't already have it:
27+
28+
`pip install pgzip`
1629

1730
Invoke the python script:
1831

1932
```
20-
python3 reform.py
33+
### Minimal Example (Single Edit)
34+
python3 reform.py \
2135
--chrom=<chrom> \
22-
--position=<pos> \
23-
--in_fasta=<in_fasta> \
24-
--in_gff=<in_gff> \
25-
--ref_fasta=<ref_fasta> \
26-
--ref_gff=<ref_gff>
36+
--position=<position> \
37+
--in_fasta=<input_fasta.fa> \
38+
--in_gff=<input_annotations.gff> \
39+
--ref_fasta=<reference_genome.fa> \
40+
--ref_gff=<reference_annotations.gff3>
2741
```
2842

2943
## Parameters
3044

31-
`chrom` ID of the chromsome to modify
45+
- `chrom`: ID of the chromosome to **modify**. **Required** unless `new_chrom` is specified. Cannot be used together with `new_chrom`.
46+
47+
- `new_chrom`: ID of the novel chromosome to **append**. **Required** if you're adding a new chromosome. Cannot be used together with `chrom`.
3248

33-
`position` Position in chromosome at which to insert <in_fasta>. Can use `-1` to add to end of chromosome. Note: Either position, or upstream AND downstream sequence must be provided. **Note: Position is 0-based**
49+
- `position`: 0-based insertion position(s) in the reference chromosome where `in_fasta` should be inserted. Use `-1` to insert at the end of the chromosome. For **multiple edits**, provide a comma-separated list (e.g., `0,5,-1`). **Note:** Either `position`, or both `upstream_fasta` and `downstream_fasta`, must be provided.
3450

35-
`upstream_fasta` Path to Fasta file with upstream sequence. Note: Either position, or upstream AND downstream sequence must be provided.
51+
- `upstream_fasta`: Path(s) to FASTA file(s) containing the upstream flanking sequence(s) for insertion. For **multiple edits**, provide a comma-separated list (e.g., `up1.fa,up2.fa,up3.fa`). Must be used with `downstream_fasta`. Cannot be used together with `position`.
3652

37-
`downstream_fasta` Path to Fasta file with downstream sequence. Note: Either position, or upstream AND downstream sequence must be provided.
53+
- `downstream_fasta`: Path(s) to FASTA file(s) containing the downstream flanking sequence(s) for insertion. For **multiple edits**, provide a comma-separated list (e.g., `down1.fa,down2.fa,down3.fa`). Must be used with `upstream_fasta`. Cannot be used together with `position`.
3854

39-
`in_fasta` Path to new sequence to be inserted into reference genome in fasta format.
55+
- `in_fasta`: Path(s) to FASTA file(s) containing the new sequence(s) to insert. For multiple edits, provide a comma-separated list. **The number of entries must match the number of `position` values or the number of upstream/downstream pairs.**
4056

41-
`in_gff` Path to GFF file describing new fasta sequence to be inserted.
57+
- `in_gff`: Path(s) to GFF3 file(s) describing the `in_fasta` sequence(s). For multiple edits, provide a comma-separated list. **The number of entries must match the number of `in_fasta` files.**
4258

43-
`ref_fasta` Path to reference fasta file.
59+
- `ref_fasta` Path to the reference genome FASTA file.
4460

45-
`ref_gff` Path to reference gff file.
61+
- `ref_gff` Path to the reference genome annotation (GFF3 or GTF) file.
62+
63+
## Examples
64+
65+
### Single Edit by Position
66+
67+
```
68+
python3 reform.py \
69+
--chrom="I" \
70+
--position=1500 \
71+
--in_fasta="data/edit.fa" \
72+
--in_gff="data/edit.gff" \
73+
--ref_fasta="data/ref.fa" \
74+
--ref_gff="data/ref.gff3"
75+
```
4676

47-
## Example
77+
### Single Edit with Upstream/Downstream Flanks
4878

4979
```
50-
python3 reform.py
80+
python3 reform.py \
5181
--chrom="I" \
5282
--upstream_fasta="data/up.fa" \
5383
--downstream_fasta="data/down.fa" \
54-
--in_fasta="data/new.fa" \
55-
--in_gff="data/new.gff" \
56-
--ref_fasta="data/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa" \
57-
--ref_gff="data/Saccharomyces_cerevisiae.R64-1-1.34.gff3"
84+
--in_fasta="data/edit.fa" \
85+
--in_gff="data/edit.gff" \
86+
--ref_fasta="data/ref.fa" \
87+
--ref_gff="data/ref.gff3"
88+
```
89+
90+
### Batch Edits (Multiple Positions)
91+
92+
```
93+
python3 reform.py \
94+
--chrom="I" \
95+
--position=1000,2500,3000 \
96+
--in_fasta="data/edit1.fa,data/edit2.fa,data/edit3.fa" \
97+
--in_gff="data/edit1.gff,data/edit2.gff,data/edit3.gff" \
98+
--ref_fasta="data/ref.fa" \
99+
--ref_gff="data/ref.gff3"
100+
```
101+
102+
### Append a Novel Chromosome
103+
104+
```
105+
python3 reform.py \
106+
--new_chrom="new_chr1" \
107+
--in_fasta="data/new1.fa" \
108+
--in_gff="data/new1.gff" \
109+
--ref_fasta="data/ref.fa" \
110+
--ref_gff="data/ref.gff3"
58111
```
59112

60113
## Output
@@ -63,3 +116,11 @@ python3 reform.py
63116

64117
`reformed.gff3` Modified GFF file.
65118

119+
## Tests
120+
After local deployment or modification, you can run `test_reform.py` to verify the functionality of *ref*orm. This script contains an automated test suite built with Python’s `unittest` framework and validates *ref*orm across a range of genome editing scenarios.
121+
122+
To run all tests:
123+
124+
```bash
125+
python3 test_.py
126+
```

0 commit comments

Comments
 (0)