Skip to content

Latest commit

 

History

History
134 lines (105 loc) · 2.65 KB

File metadata and controls

134 lines (105 loc) · 2.65 KB

Quick Start Guide

Basic Usage

1. Simple Simulation

Run with default parameters:

umi-simulator

This generates:

  • output.fastq.gz - Simulated reads with UMIs
  • output_stats.txt - Simulation statistics
  • output_truth.txt - Ground truth molecule counts

2. Custom Parameters

umi-simulator \
    --genes 1000 \
    --molecules 50000 \
    --pcr-cycles 10 \
    --read-length 150 \
    -o my_simulation

3. Using Real Genome Data

umi-simulator \
    --gtf genes.gtf \
    --fasta genome.fa \
    --use-real-sequences \
    --molecules 100000 \
    -o real_genome_sim

4. Paired-End Sequencing

umi-simulator \
    --paired-end \
    --fragment-length 300 \
    --read-length 75 \
    --molecules 50000 \
    -o paired_end_sim

Creates:

  • paired_end_sim_R1.fastq.gz
  • paired_end_sim_R2.fastq.gz

Python API

from umi_simulator import BulkRNAUMISimulator

# Create simulator
sim = BulkRNAUMISimulator(
    n_genes=1000,
    total_molecules=10000,
    umi_length=10,
    read_length=150,
    pcr_cycles=10,
    pcr_efficiency=0.7,
    random_seed=42
)

# Run simulation
sim.run_simulation(output_prefix="output/my_sim")

Common Use Cases

Testing UMI Deduplication

# Simulate with high PCR amplification
umi-simulator \
    --molecules 10000 \
    --pcr-cycles 15 \
    --pcr-efficiency 0.85 \
    -o high_pcr

# Ground truth is in high_pcr_truth.txt
# Compare with your pipeline's deduplicated counts

Benchmarking Pipelines

# Generate exactly 100,000 reads
umi-simulator \
    --target-reads 100000 \
    --pcr-cycles 10 \
    -o benchmark

# Simulator back-calculates required molecules

Error Rate Testing

# High error rates
umi-simulator \
    --sequencing-error-rate 0.01 \
    --umi-error-rate 0.02 \
    --molecules 50000 \
    -o high_errors

Key Parameters

Parameter Description Default
--genes Number of genes 100
--molecules Initial molecules 1000
--target-reads Target read count (alternative to --molecules) None
--pcr-cycles PCR cycles 10
--pcr-efficiency PCR efficiency 0.7
--umi-length UMI length (bp) 10
--read-length Read length (bp) 100
--paired-end Enable paired-end mode False
--fragment-length Fragment length for PE (bp) 300
--gtf GTF annotation file None
--fasta FASTA genome file None

Next Steps