-
Notifications
You must be signed in to change notification settings - Fork 11
EDirect Wiki!
Entrez Direct (EDirect) is an advanced method for accessing the NCBI's suite of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command-line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process. EDirect also provides an argument-driven function that simplifies the extraction of data from document summaries or other results that are returned in structured XML format. This can eliminate the need for writing custom software to answer ad hoc questions. Queries can move seamlessly between EDirect commands and UNIX utilities or scripts to perform actions that cannot be accomplished entirely within Entrez.
EDirect will run on UNIX and Macintosh computers that have the Perl language installed, and under the Cygwin UNIX-emulation environment on Windows PCs.To install the EDirect software, copy the following commands and paste them into a terminal window:
cd ~
perl -MNet::FTP -e \
'$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1);
$ftp->login; $ftp->binary;
$ftp->get("/entrez/entrezdirect/edirect.zip");'
unzip -u -q edirect.zip
rm edirect.zip
export PATH=$PATH:$HOME/edirect
./edirect/setup.sh
This downloads several scripts into an "edirect" folder in the user's home directory, and allows immediate execution of programs in that location.
The setup.sh script then downloads any missing Perl modules, and may print an additional command for updating the PATH environment variable in the user's configuration file. Copy that command, if present, and paste it into the terminal window to complete the installation process. The editing instructions will look something like:
echo "export PATH=\$PATH:\$HOME/edirect" >> $HOME/.bash_profile
If the EDirect scripts will be moved to another location, the configuration file can instead be modified manually using a text editor.
esearch -db pubmed -query "opsin gene conversion" |
elink -related |
efilter -query "tetrachromacy"
esearch -db pubmed -query "Garber ED [AUTH] AND PNAS [JOUR]" |
elink -related |
efilter -query "mouse" |
efetch -format docsum
will generate an XML document summary set:
<eSummaryResult>
<DocumentSummarySet status="OK">
<DbBuild>Build150407-2207m.3</DbBuild>
<DocumentSummary>
<Id>19650888</Id>
<PubDate>2009 Aug 3</PubDate>
<EPubDate>2009 Aug 3</EPubDate>
<Source>BMC Microbiol</Source>
<Authors>
<Author>
<Name>Cano V</Name>
<AuthType>Author</AuthType>
<ClusterID></ClusterID>
</Author>
<Author>
<Name>Moranta D</Name>
...
esearch -db gene -query "Liver cancer AND Homo sapiens" | efetch -format docsum | xtract -pattern
DocumentSummary -element Name OtherAliases OtherDesignations
wget `esearch -db assembly -query "Leptospira alstonii" | efetch -format docsum | xtract -pattern FtpPath -sep
"\n" -element FtpPath | grep GCF | awk -F"/" '{print $0"/"$NF"_genomic.fna.gz"}'`
esearch -db nuccore -query "LKAM01" | efetch -format fasta
asn2fasta -id NC_000023-feats gene_fasta
efetch -db nuccore -id NC_000023 -format gene_fasta
efetch -db taxonomy -id 9606,1234,81726 -format xml | xtract -pattern Taxon -tab "," -first TaxId
ScientificName -group Taxon -KING "(-)" -PHYL "(-)" -CLSS "(-)" -ORDR "(-)" -FMLY "(-)" -GNUS "(-)" -block
"*/Taxon" -match "Rank:kingdom" -KING ScientificName -block "*/Taxon" -match "Rank:phylum" -PHYL ScientificName
-block "*/Taxon" -match "Rank:class" -CLSS ScientificName -block "*/Taxon" -match "Rank:order" -ORDR
ScientificName -block "*/Taxon" -match "Rank:family" -FMLY ScientificName -block "*/Taxon" -match "Rank:genus"
-GNUS ScientificName -group Taxon -tab "," -element "&KING" "&PHYL" "&CLSS" "&ORDR" "&FMLY" "&GNUS"
All the protein sequences from both the bacterial and archaeal complete genome sequences [User request.]
esearch -db assembly -query '("Bacteria"[Organism] OR "Archaea"[Organism]) AND (latest[filter] AND "complete
genome"[filter] AND all[filter] NOT anomalous[filter])' | elink -target nuccore -batch | elink -target protein
-batch | efetch -db protein -format fasta
WARNING: Large result set.
-
tutorials
-
links