-
Notifications
You must be signed in to change notification settings - Fork 8
Tutorial
In the following examples we will start with a file of DNA sequences for a set of genes of interest and we will end up with a mapping of GO terms for each sequence.
Starting with a file of DNA sequences, we first want to get the longest open reading frame (ORF) for each gene and translate those sequences.
hmmer2go getorf -i genes.fasta -o genes_orfs.faa
Next, we search our ORFs for coding domains.
hmmer2go search -i genes_orfs.faa -d Pfam-A.hmm
The above command will create three files: genes_orfs_hmmscan-pfamA.out, genes_orfs_hmmscan-pfamA.domtblout, and genes_orfs_hmmscan-pfamA.tblout
We will now use the table of domain matches to map GO terms. To do this we first need to download the Pfam->Gene Ontology mappings. This can be done with a single command:
hmmer2go fetch
The above command creates the file: pfam2go.
Now we can map the protein domain matches to GO terms.
hmmer2go mapterms -i genes_orfs_hmmscan-pfamA.tblout -p pfam2go -o genes_orfs_hmmscan-pfamA_GO.tsv --map
This last command will create two output files: genes_orfs_hmmscan-pfamA_GO.tsv, and genes_orfs_hmmscan-pfamA_GO_GOterm_mapping.tblout
The first output file is a tab-delimited table with a description of each domain, including the GO terms and the associated functions. The last file is a two column table with the sequence name in the first column and the GO terms associated with that sequence in the second column.