The homolog_identify.sh script performs the following tasks:
- Uses BLAST to search a query file against a subject file.
- Identifies putative homologs based on:
- Percent identity greater than 30%.
- Length of the aligned region being at least 90% of the query length.
- Processes BLAST results and compares them to a BED file to determine if the putative homologs are located within (boundaries inclusive) gene boundaries.
- Saves the names of genes with homologs into a
.txtfile.
The run_homolog_identify.sh script automates the homolog identification process by:
- Iterating through a directory containing
.faaand.bedfiles. - Running the
homolog_identify.shscript for each corresponding pair of.faaand.bedfiles.
- pident: Percent identical matches.
- length: Length of the aligned region between the query and subject.
- query: The query sequence being searched against the database.
- sseqid: The FASTA header in the subject file.
- Use the flag
-task blastn-shortif the query sequence is shorter than 50 bases.