Run with default parameters:
umi-simulatorThis generates:
output.fastq.gz- Simulated reads with UMIsoutput_stats.txt- Simulation statisticsoutput_truth.txt- Ground truth molecule counts
umi-simulator \
--genes 1000 \
--molecules 50000 \
--pcr-cycles 10 \
--read-length 150 \
-o my_simulationumi-simulator \
--gtf genes.gtf \
--fasta genome.fa \
--use-real-sequences \
--molecules 100000 \
-o real_genome_simumi-simulator \
--paired-end \
--fragment-length 300 \
--read-length 75 \
--molecules 50000 \
-o paired_end_simCreates:
paired_end_sim_R1.fastq.gzpaired_end_sim_R2.fastq.gz
from umi_simulator import BulkRNAUMISimulator
# Create simulator
sim = BulkRNAUMISimulator(
n_genes=1000,
total_molecules=10000,
umi_length=10,
read_length=150,
pcr_cycles=10,
pcr_efficiency=0.7,
random_seed=42
)
# Run simulation
sim.run_simulation(output_prefix="output/my_sim")# Simulate with high PCR amplification
umi-simulator \
--molecules 10000 \
--pcr-cycles 15 \
--pcr-efficiency 0.85 \
-o high_pcr
# Ground truth is in high_pcr_truth.txt
# Compare with your pipeline's deduplicated counts# Generate exactly 100,000 reads
umi-simulator \
--target-reads 100000 \
--pcr-cycles 10 \
-o benchmark
# Simulator back-calculates required molecules# High error rates
umi-simulator \
--sequencing-error-rate 0.01 \
--umi-error-rate 0.02 \
--molecules 50000 \
-o high_errors| Parameter | Description | Default |
|---|---|---|
--genes |
Number of genes | 100 |
--molecules |
Initial molecules | 1000 |
--target-reads |
Target read count (alternative to --molecules) | None |
--pcr-cycles |
PCR cycles | 10 |
--pcr-efficiency |
PCR efficiency | 0.7 |
--umi-length |
UMI length (bp) | 10 |
--read-length |
Read length (bp) | 100 |
--paired-end |
Enable paired-end mode | False |
--fragment-length |
Fragment length for PE (bp) | 300 |
--gtf |
GTF annotation file | None |
--fasta |
FASTA genome file | None |
- See examples/ for more detailed examples
- Read WORKFLOWS.md for integration testing
- Check INSTALLATION.md for setup help