Salmon and kallisto perform transcript quantification directly from FASTQ files without traditional alignment, making them much faster than alignment-based methods.
# Salmon
conda install -c bioconda salmon
# kallisto
conda install -c bioconda kallistoTell your AI agent what you want to do:
- "Quantify my RNA-seq samples using Salmon"
- "Build a Salmon index with decoy sequences"
- "Run kallisto on my paired-end FASTQ files"
"Build a decoy-aware Salmon index from the human transcriptome and genome"
"Create a kallisto index from my transcripts.fa file"
"Quantify paired-end reads for sample1 using Salmon with bias correction"
"Run Salmon quant on all my FASTQ files in batch mode"
"Explain the columns in the Salmon quant.sf output file"
"What's the difference between TPM and NumReads in Salmon output?"
- Download or locate the reference transcriptome (Ensembl/GENCODE)
- Build the index (optionally with decoy sequences for Salmon)
- Run quantification on each sample with appropriate parameters
- Check mapping rates and quality metrics
- Organize output files for downstream analysis with tximport
# Human
wget https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
# Mouse
wget https://ftp.ensembl.org/pub/release-110/fasta/mus_musculus/cdna/Mus_musculus.GRCm39.cdna.all.fa.gz# Human
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.transcripts.fa.gz# Get genome and transcriptome
wget genome.fa.gz
wget transcripts.fa.gz
gunzip *.gz
# Extract chromosome names as decoys
grep "^>" genome.fa | cut -d " " -f 1 | sed 's/>//g' > decoys.txt
# Concatenate (transcriptome first!)
cat transcripts.fa genome.fa > gentrome.fa
# Build index
salmon index -t gentrome.fa -d decoys.txt -i salmon_index -p 8| Column | Description |
|---|---|
| Name | Transcript ID |
| Length | Transcript length |
| EffectiveLength | Length adjusted for bias |
| TPM | Transcripts per million |
| NumReads | Estimated read count |
| Column | Description |
|---|---|
| target_id | Transcript ID |
| length | Transcript length |
| eff_length | Effective length |
| est_counts | Estimated counts |
| tpm | Transcripts per million |
- TPM - Normalized, comparable across samples, use for visualization
- Counts - Use with tximport for DESeq2/edgeR (they need raw counts)
- Use decoy-aware Salmon index for best accuracy
- Enable bias correction with
--gcBias --seqBiasin Salmon - Generate bootstraps (
-b 100) if using sleuth for DE - Check mapping rates - should be >70%
- Match transcriptome version to your GTF annotation