Tutorial

In the following examples we will start with a file of DNA sequences for a set of genes of interest and we will end up with a mapping of GO terms for each sequence.

Starting with a file of DNA sequences, we first want to get the longest open reading frame (ORF) for each gene and translate those sequences.

hmmer2go getorf -i genes.fasta -o genes_orfs.faa

Next, we search our ORFs for coding domains.

hmmer2go search -i genes_orfs.faa -d Pfam-A.hmm

The above command will create three files: genes_orfs_hmmscan-pfamA.out, genes_orfs_hmmscan-pfamA.domtblout, and genes_orfs_hmmscan-pfamA.tblout

We will now use the table of domain matches to map GO terms. To do this we first need to download the Pfam->Gene Ontology mappings. This can be done with a single command:

hmmer2go fetch

The above command creates the file: pfam2go.

Now we can map the protein domain matches to GO terms.

hmmer2go mapterms -i genes_orfs_hmmscan-pfamA.tblout -p pfam2go -o genes_orfs_hmmscan-pfamA_GO.tsv --map

This last command will create two output files: genes_orfs_hmmscan-pfamA_GO.tsv, and genes_orfs_hmmscan-pfamA_GO_GOterm_mapping.tblout

The first output file is a tab-delimited table with a description of each domain, including the GO terms and the associated functions. The last file is a two column table with the sequence name in the first column and the GO terms associated with that sequence in the second column.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally