-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathModeler and Masked
More file actions
166 lines (135 loc) · 9.08 KB
/
Modeler and Masked
File metadata and controls
166 lines (135 loc) · 9.08 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
### RepeatModeler usefull tool to indetificante and define Transposons (TEs),
##########################################################################################################################################################################################################################################
### AS ALWAYS YOU HAVE 2 WAYS TO DOWNLOAD, FIRST IN CONDA AND SECOND IN SERVE
### 1 OPTION - see ANACONDA, MINICONDA & CONDA file
### As the others programms, make a envoirment in conda and install all these other programms
(base) $ conda creat -n Modeler # or mamba install -n Modeler
(base) $ conda activate Modeler # or mamba activate Modeler
(Modeler) $ conda install -c anaconda perl; conda install -c anaconda biopython; conda install -c bioconda perl-app-cpanminus; conda install -c bioconda perl-file-spec;
conda install -c bioconda perl-hash-merge; conda install -c bioconda perl-list-util; conda install -c bioconda perl-module-load-conditional; conda install -c bioconda perl-posix;
conda install -c bioconda perl-file-homedir; conda install -c bioconda perl-parallel-forkmanager; conda install -c bioconda perl-scalar-util-numeric; conda install -c bioconda perl-yaml;
conda install -c bioconda perl-class-data-inheritable; conda install -c bioconda perl-exception-class; conda install -c bioconda perl-test-pod; conda install -c bioconda perl-file-which;
conda install -c bioconda perl-mce; conda install -c bioconda perl-threaded; conda install -c bioconda perl-list-util; conda install -c bioconda perl-math-utils;
conda install -c bioconda cdbtools; conda install -c eumetsat perl-yaml-xs; conda install -c bioconda perl-data-dumper
### The perl-file-which need only to comparing to reference annotation
(Modeler) $ conda install bioconda::repeatmodeler # or mamba install -c bioconda -c conda-forge repeatmodeler
conda install bioconda/label/cf201901::repeatmodeler
### This is other option
(base) $ export MAMBA_ROOT_PREFIX=/opt/Micromamba/
(base) $ eval "$(/opt/bin/Micromamba/1.5.8-0/bin/micromamba shell hook -s posix)"
(base) $ micromamba activate RepeatModeler-2.0.5
### Go to directory of
(RepeatModeler-2.0) $ BuildDatabase -name RMod_Database <file>.fna >& database.out
(RepeatModeler-2.0) $ RepeatModeler -database database.out -threads <numbercores> -LRTStruct >& RM.out
### 2 OPTION
### Install in default terms
### Initiate with cloning the GitHub repository
(base) $ git clone https://github.com/Dfam-consortium/RepeatModeler
### As you can see in the https://www.repeatmasker.org/RepeatModeler/, you need more programms (RECON, RepeatScout, LtrHarvest) and more databases (Tandem Repeater Finder)
(base) $ wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz;
wget http://www.repeatmasker.org/RepeatScout-1.0.6.tar.gz;
wget https://github.com/Benson-Genomics-Lab/TRF/archive/refs/tags/v4.09.1.tar.gz;
wget https://github.com/Benson-Genomics-Lab/TRF/releases/download/v4.09.1/trf409.dos64.exe;
wget https://www.repeatmasker.org/rmblast/rmblast-2.14.0+-x64-linux.tar.gz;
wget https://platform.activestate.com/create-account?nextRoute=%2Fcreate-project%3Flanguage%3Dperl%26newUser%3Dtrue%26autoCreate%3Dtrue%26version%3D5.36.3%26os%3Dlinux
wget https://github.com/oushujun/LTR_retriever/archive/refs/tags/v2.9.8.tar.gz
(base) $ gunzip *.gz
(base) $ tar -xvf *.tar; ls -las
### Move the trf*.exe inside the TRF* directory, its better, and create a symbolic link called "trf" only to this file
(base) $ mv trf*.exe TRF-4.09.1; cd TRF* ; ln -s -T trf*.exe trf # (-T) target and (-s) symbolic link
### You need RepeatMasker also, see below. Total of documents = 2 files, 5 directorys, 4 files .tar
### RUN THE PROGRAMM:
### Creat a Database using the FASTA datum
(base) $ ~/<PATH>/RepeatModeler/BuildDatabase -name RMod_Database *.fna ### just 1 FASTA
(base) $ ~/<PATH>/RepeatModeler/BuildDatabase - ### a lot of FASTAS
### After create the database, use the database to run the programm
(base) $ ~/<PATH>/RepeatModeler/RepeatModeler -database RMod_Database -threads 32 -LTRStruct >& file.out
##########################################################################################################################################################################################################################################
### RepeatMasker use
### AS ALWAYS YOU HAVE 2 WAYS TO DOWNLOAD, FIRST IN CONDA AND SECOND IN SERVE
### 1 OPTION - see ANACONDA, MINICONDA & CONDA file
### As the others programms, make a envoirment in conda and install all these other programms
(base) $ conda creat -n Masker
(base) $ conda activate Masker
(Masker) $ conda install bioconda::repeatmasker
### 2 OPTION
### Install in default terms
### Initiate acessing this page https://www.repeatmasker.org/RepeatMasker/ and take the link
(base) $ wget https://www.repeatmasker.org/RepeatMasker/RepeatMasker-4.1.6.tar.gz
### You should have other programms, see RepeatModeler
(base) $ gunzip *.gz
(base) $ tar -xvf *.tar; ls -las
### START USE THE PROGRAMM, LETS CONFIGURE ###
(base) $ ~/<PATH>/RepeatMasker/configure
### Actually, the command configure can only work at RepeatMasker directory
(base) $ cd ~/<PATH>/RepeatMasker ; ./configure
### See whats say. If need more programms. I needed famDB data bases and the TRF (see RepeatModeler, line 31). So go to famdb directory and download the dataset (see in https://www.dfam.org/releases/Dfam_3.8/families/FamDB/README.txt)
(base) $ cd ~/<PATH>/RepeatMasker/Liberies/famdb/
(base) $ nohup wget https://www.dfam.org/releases/Dfam_3.8/families/FamDB/dfam38_full.0.h5.gz ### (Mammals and others Eukaryas)
(base) $ nohup wget https://www.dfam.org/releases/Dfam_3.8/families/FamDB/dfam38_full.3.h5.gz ### (Sarcopterygii)
(base) $ nohup wget https://www.dfam.org/releases/Dfam_3.8/families/FamDB/dfam38_full.6.h5.gz ### (Deuterostomes, Chondricties and Basal Actinopterygiis)
### Take a feel hours
(base) $ gunzip *.gz
### Insert in configuration the PATH of the libery dfam* and again use the configure command
(base) $ cd ~/<PATH>/RepeatMasker/; ./configure
RepeatMasker Configuration Program
The full path of FamDB in Liberies
FAMDB: /home/<PATH>/RepeatMasker/Liberies/famdb
### continue configuration pressing ENTER
Checking for libraries...
- Found a FamDB root partition
The full path including the name for the TRF program.
TRF_PRGM: home/<PATH>/TRF*/trf ### remember this "trf" is a hyperlink (see in RepeatModeler, line 40 and 41)
Add a Search Engine:
1. CrossMatch: [ Un-configured ]
2. RMBlast - NCBI Blast with RepeatMasker extensions: [ Un-configured ]
3. WUBlast/ABBlast (required by DupMasker): [ Un-configured ]
4. DeCypher (TimeLogic): [ Un-configured ]
5. Done
Enter Selection: 2
### Press 2 and ENTER. Now take the information of rmblastn and makeblastdb in RMBlast directory
The path to the installation of the RMBLAST sequence alignment program.
RMBLAST_DIR: home/<PATH>/rmblast-2.14.0/bin/
### Press ENTER
Add a Search Engine:
1. Crossmatch: [ Un-configured ]
2. RMBlast: [ Configured, Default ]
3. HMMER3.1 & DFAM: [ Un-configured ]
4. ABBlast: [ Un-configured ]
5. Done
Enter Selection: 3
### Press 3 and ENTER
The path to the installation of the RMBLAST sequence alignment program
hmmer: /home/<PATH>/hmmer-3.3.2/src/
### Press ENTER
Add a Search Engine:
1. Crossmatch: [ Un-configured ]
2. RMBlast: [ Configured, Default ]
3. HMMER3.1 & DFAM: [ Configured, Default ]
4. ABBlast: [ Un-configured ]
5. Done
Enter Selection: 5
### Press 5 and ENTER, know you have your programm configured
Building RMBlast frozen libraries..
FamDB Directory : /home/fssalles/MandM/RepeatMasker/Libraries/famdb
FamDB Generator : famdb.py v1.0
FamDB Format Version: 1.0
FamDB Creation Date : 2023-11-15 11:30:15.311827
Database: Dfam
Version : 3.8
Date : 2023-11-14
Dfam - A database of transposable element (TE) sequence alignments and HMMs.
### As you can see in Partition Details only the Partion 0, 3 and 6 (Eukaryas unicelular, bacteria and virus, Actinpterygiis and Sarcopterygiis) are available, but you can add the partion you need (0 - 8)
### Lests start use, move to the directory you have your fasta files:
(base) $ cd ~/<PATH>/<files.fasta>/
### Start searching which is the real name of those species you have fasta files using the famdb.pl command
(base) $ famdb.py -h; fambd.py -names -h
(base) $ famdb families <familie_name> --format fasta_name --include-class-in-name -ad ### (--format) output format
### Now you can run the programm with the name species, or the libaries you constructed. MOVE ALL FASTA FILES TO RepeatMasker DIRECTORY
(base) $ move *.fna ~/<PATH>/RepeatMasker
(base) $ ./RepeatMasker -s -a -species 'name_of_specie' *.fna ### (-s) slow process, more sensitive (-q) Quick and (-qq) Rush job (-a) alignment
### OR, USE THE LIBARY YOU CONSTRUCT WITH RepeatModeler
(base) $ ./RepeatMasker -e rmblast -s -lib ~/<PATH>/consensi.fa -xsmall -pa 15 -a -html -gff -dir <file.out> <file.fna>
(base) $
(base) $
(base) $ https://www.dfam.org/releases/Dfam_3.8/families/FamDB/dfam38_full.6.h5.gz.md5sum