-
Notifications
You must be signed in to change notification settings - Fork 36
Slow batches in "Getting Started" setup #408
Description
Hi there! I am planning to use Readfish for a project optimising selective sequencing for our particular use case. I am just trying to get it up and running on my machines and am experiencing slow batches (>9s) in the getting started dataset, which the documentation advises is a reason to get in touch. I have now set Readfish up on two systems and have simulated playback etc running on both. One is a Mac (Apple Silicon; not described but can provide details if need be), the other is an Ubuntu workstation (described below). Both are showing similar batch times, which is making me suspect user error somewhere along the way but I am afraid I am struggling to diagnose it myself.
Other possibly useful information: Running the "unblock-all" command gives fast batches. Running readfish validate on the toml file seems to work just fine.
Here are the details of my setup:
Ubuntu 22.04 x86_64
AMD Ryzen 9 5950X CPU (16 core/32 thread)
2 x NVIDIA 3090Ti GPU (CUDA and drivers up to date and working)
Minknow core 6.5.14
dorado_basecall_server and ont-pybasecall-client-lib 7.9.8
mappy 2.30
mappy-rs 0.0.7
fast5 file: PLSP57501_20170308_FNFAF14035_MN16458_sequencing_run_NOTT_Hum_wh1rs2_60428.fast5
ref genome: GCF_000001405.40_GRCh38.p14_genomic.fna (with non-main chromosomes removed, renamed to match convention in example and and converted to minimap2 index)
model: dna_r10.4.1_e8.2_400bps_5khz_fast.cfg (checked the fast5 file and from the metadata this seems to be the correct model for this file?)
Here is the text of my toml file (modified slightly from the one in the documentation)
# Summary: TOML config file for real-time basecalling, enriching for
# two chromosomes (chr 20 and 21) from the human genome in a single region.
# This region applies to all channels across the whole flowc ell.
# This example is configured for running with an R9.4.1 bulk file for testing.
# Genomic targets are specified directly in the toml file in the form
# "contig"
# or as:
# "contig,start,end,strand"
# for example, "chr2,0,100,+"" or "chr2".
# If chr2 is specified, the entire contig will be considered a target, on both strands.
# These patterns can be mixed, both in the toml or in a .csv file.
# All of the below fields are explained in more detail in the documentation -
# https://looselab.github.io/readfish/toml.html
# Basecaller configuration
[caller_settings.dorado]
# ^^^^^^ - ".guppy" specifies our chosen basecaller
# If using dorado >7.3.9, this should be ".dorado".
# All other parameters are shared between the two basecallers.
# Guppy/Dorado base-calling configuration file name
config = "dna_r9.4.1_450bps_fast"
# Address of the guppy/dorado basecaller - The default address for guppy is ipc:///tmp/.guppy/5555.
address = "ipc:///tmp/.guppy/5555"
# Fastq output for individual reads. This is OPTIONAL - as these files can become quite large.
# Remove line to disable.
debug_log = "live_reads.fq"
# Aligner Configuration
[mapper_settings.mappy]
# ^^^^^^ - ".mappy" specifies mappy as the aligner. Use mappy_rs for the multithreaded rust version (required on PromethION)
# Alignment reference to use. Should be either FASTA or an MMI
fn_idx_in = "/ssd2/usernamehere/readfish_testing/hg38/hg38.mmi"
# Optional PAF output for live alignments.
# Remove line to disable.
debug_log = "live_alignments.paf"
# Number of threads for indexing (mappy and mappy-rs) and mapping (mappy-rs only)
n_threads = 4
# Region Configuration - see https://looselab.github.io/readfish/toml.html#analysis-regions for more information.
# Definitions of "unblock", "proceed" and "stop_receivings" are as follows
# proceed: Allow one more chunk to be captured, before trying to make another decision, i.e proceed for now
# unblock: Unblock the read
# stop_receiving: Allow the read to be sequenced, and stop receiving signal chunks for it.
# This region will enrich for reads mapping to chr20 and chr21.
[[regions]]
name = "hum_test"
min_chunks = 1 # minimum number of chunks before a decision can be made
max_chunks = 4 # maximum number of chunks to use in decision making - after this perform the above_max_chunks action
targets = ["chr20", "chr21"] # Genomic targets for this region
single_on = "stop_receiving" # Action to take if there is one mapping on target.
multi_on = "stop_receiving" # Action to take if there is more than one mapping, with at least one target.
single_off = "unblock" # Action to take if there is one mapping and it is off target
multi_off = "unblock" # Action to take if there are multiple mappings, where all are off target.
no_seq = "proceed" # Action to take if there is no sequence information
no_map = "proceed" # Action to take if there is no mapping information
above_max_chunks = "unblock" # Action to take if the number of chunks received is above max_chunks
below_min_chunks = "proceed" # Action to take if the number of chunks received is below min_chunks
Here is the output of the command specified in the "Getting Started" docs:
readfish targets --toml human_chr_selection.toml --device MS00000 --log-file test.log --experiment-name human_select_test
2025-10-29 16:30:57,297 readfish /home/usernamehere/miniforge3/envs/readfish/bin/readfish targets --toml human_chr_selection.toml --device MS00000 --log-file test.log --experiment-name human_select_test
2025-10-29 16:30:57,297 readfish chemistry=<Chemistry.SIMPLEX: 'simplex'>
2025-10-29 16:30:57,297 readfish command='targets'
2025-10-29 16:30:57,297 readfish debug_log=True
2025-10-29 16:30:57,297 readfish device='MS00000'
2025-10-29 16:30:57,297 readfish dry_run=False
2025-10-29 16:30:57,297 readfish experiment_name='human_select_test'
2025-10-29 16:30:57,297 readfish host='127.0.0.1'
2025-10-29 16:30:57,298 readfish log_file='test.log'
2025-10-29 16:30:57,298 readfish log_format='%(asctime)s %(name)s %(message)s'
2025-10-29 16:30:57,298 readfish log_level='info'
2025-10-29 16:30:57,298 readfish max_unblock_read_length_seconds=5
2025-10-29 16:30:57,298 readfish padding=0
2025-10-29 16:30:57,298 readfish port=None
2025-10-29 16:30:57,298 readfish throttle=0.4
2025-10-29 16:30:57,298 readfish toml='human_chr_selection.toml'
2025-10-29 16:30:57,298 readfish unblock_duration=0.1
2025-10-29 16:30:57,298 readfish wait_for_ready=120
2025-10-29 16:30:57,298 readfish Version=2024.3.0
2025-10-29 16:30:57,328 readfish.targets This readfish version (2024.3.0) is tested for compatibility with MinKNOW v6.0.0 to v6.0.0.
This version of minknow is 6.5.14.
If readfish fails please try to upgrade readfish.
If there isn't a newer version of readfish and readfish is failing, please open an issue:
https://github.com/LooseLab/readfish/issues
2025-10-29 16:30:57,369 readfish._read_until_client Protocol phase changed to PHASE_SEQUENCING
2025-10-29 16:30:57,369 readfish._read_until_client Protocol state changed to PROTOCOL_RUNNING
2025-10-29 16:30:57,370 readfish.targets eJydVttu3DgMffdXCM5Lg514kjTFbgNkgewWLQq0TdGmT0HW0NjyWIgtOZKcy9/3kJI9M70E2w6CwLakQ/LwkOKe+Dz2vXSPp+Ly4v07UVnT6LVodKdEY51wSnYHQfdKrKRXlew6bdYLoYzTVYtH2pTtiXBvRdU621tve+XFM7yI40MhTS2Oj/ZFgyURWiXasZdGrJXBNqGNkMIDBcacWmtrCmBdttqnVyGHodPAC1bANExIY1Tnhayc9Z4R71tLvnb2vhKq64psglAPsh+whMcY1ehUHYMajSHf73Vo4aL49LI4KY7EauxuNpEH5QM2Md4b8ldXIki3VgHmnRJ+UJVuNCBr7VQVukeKhzwKtu8iTvoAuB4oQuRwJOh1jhdYkP50++vCAz4slKnx5MAcbSNPUiALbGzd8eJwcXR4uPgjzwmDP+VE29uGMnBM4c6uLdi8AjocjlYQNIhc8avXtSJOZApsIUD5yoKU6ICP6VBeiUGGoJwBleALp3v9QPC8eTtsuMRZLSp/xxwwf+cwaRvetVLIFFZUV0ce1cPQSW3gBg72Fl9qFaTuJtjaVmOPEGQgQRwArQ1h8KfLZWetV51cFWvkcVwV2i4h17rRvl2SM0Ub+o7M/5Okq9ysBEbLruLX0qtAufZFbZ2s7TWnZfv3H//EgciL9TgMj/nMsRd2BGwLV8xcI8p9h4D0jCR1EU2Iv/8snhcvKUGUsNaOXU285skFTunuj0kEIw7JcLJXyEdSYispiysV7pVKuUA9bpzhPL4hv5evonVaO0jVvMtJFK4BfpZ6wZnIayNLx0VSnrw4XA2+bKQPpM/zunYKlZiyy+Qs642RRPsByQiJbeTYBSHTIRI3nyDR6qFCTpehH5aR4+UL/Ips2gw3fryF3HgNd26RiTCMgWG1qfWdrkfZUQuDkmNPwN/Fx8u3Fx/O38EnyR0E6qaYJ2lX1JluRx2U6KgqiLtPqrd3eIdMqRXV2ssVSbtWq3FddpZJ6vSdKqOx5jaPstdrg+j/3RVdj662LTp6f3xSc7xjW3P8IfmPzshmCvEFofBK6SK5tNqDcdQHOQaRuNEHcQdJUKqfOYVASTt4+YgWrUILbvYn16nsQF+DJmEqjnz0qhCfZ7UqzXp8ff758pw7mhHv37/NGlPq+qFEAYOWpff18VI6Yw+P+rlAy9Rel+36+V/8r+h7Tam8GIgl5O3j+evtjBK9MVRyy/9aWrbODbIhMx/GfgXXWbectEk26oFq4lliGPcXPx04vz+/bTbgM7jrHvczU044Z+IkY+f4BtvJPZLplfq1BrYnQcaj1/4g3onRUW6U2tDNwshExyvVaKMD70Fc+WhWna1uclwcg7OVUnXOIeQ+2AFKrRRKBALMuYtIAu7QnH2Wkfz2RDp0Sp0HPduCZTZbtaO5IcLpFpFDoHsVV4FqaDG4R6IHq728AaqJLauGcklzC6ELNSFzJMbeTwaTw6fiS3xg/RIj04ZdxyfHpl3JJQ9Rk17hE0VLZ8R8BuPGmsTFMaSch2+nDr4i44gzDUG1nzMfeNRJww09HRXZ1VVKzvV1Rr2ThIdRh1WeZ702ZTJ4Jo4QBz7ofuyFmUWYlhOJcuZrvm9RvVkvHzY4J4QjH36ME0uVrtAZCOkg79H1Gtwc8d5BGyIFxS6yQi2VWyZkxe1qmnnOxBUPG4d5GkSO8mvKybezUew7M5lZnO9Ky81gN4E5nT9nM+RxIMFovkgcj22suMQ6bWELRcYd7WeATyOyfgOmyG3sRRoEg+gU7hFemmxN3jcNWZsqamrR/9t7kooO/BVIEXyK48fQT2BTtfJZmm6TBY8g4iKkSxs2dorMWFw2t2Rl7gNbvyeDMHaup+12Q5Cw/LuQEy3biN/pb4eUnyB+r/yoBZomfZS02EBmPHyWO+W45f9v2Yjz7AYy+wpEEYGB
2025-10-29 16:30:57,376 readfish.targets Configuration description:
Region hum_test (control=False).
Region applies to section of flow cell (# = applied, . = not applied):
################################
################################
################################
################################
################################
################################
################################
################################
2025-10-29 16:30:57,378 readfish.targets Fetching Run Configuration
2025-10-29 16:30:57,380 readfish.targets Run Configuration Received
2025-10-29 16:30:57,380 readfish.targets run_id=dc014f3a-cf19-436c-9c7b-c0a2672b2e44
2025-10-29 16:30:57,380 readfish.targets break_reads_after_seconds=0.8
2025-10-29 16:30:57,380 readfish.targets sample_rate=5000
2025-10-29 16:30:57,380 readfish.targets Initialising Caller
2025-10-29 16:30:57,406 readfish.targets Caller initialised
2025-10-29 16:30:57,406 readfish.targets Utilising the Dorado base-caller plugin:
- config: dna_r9.4.1_450bps_fast
- address: ipc:///tmp/.guppy/5555
- priority: read_priority.high_priority
- client_name: Readfish_connection
2025-10-29 16:30:57,407 readfish.targets Initialising Aligner
2025-10-29 16:31:04,114 readfish.targets Aligner initialised
2025-10-29 16:31:04,115 readfish.targets Starting main loop
2025-10-29 16:31:04,116 readfish.targets Generating aligner description, if possible...
2025-10-29 16:31:17,587 readfish.targets Using the mappy plugin. Using reference: /ssd2/usernamehere/readfish_testing/hg38/hg38.mmi.
Region hum_test has targets on 2 contigs, with 2 found in the provided reference.
This region has 4 total targets (+ve and -ve strands), covering approximately 3.60% of the genome.
2025-10-29 16:31:17,590 readfish.targets readfish started in PHASE_SEQUENCING. Fully sequencing first read from each channel.
2025-10-29 16:31:26,942 readfish.targets 0087R/9.3519s; Avg: 0087R/9.3519s; Seq:87; Unb:0; Pro:0; Slow batches (>0.80s): 1/1
2025-10-29 16:31:45,731 readfish.targets 0308R/18.7877s; Avg: 0197R/14.0698s; Seq:371; Unb:5; Pro:19; Slow batches (>0.80s): 2/2
2025-10-29 16:32:03,404 readfish.targets 0244R/17.6709s; Avg: 0213R/15.2702s; Seq:447; Unb:45; Pro:147; Slow batches (>0.80s): 3/3
2025-10-29 16:32:18,194 readfish.targets 0304R/14.7888s; Avg: 0235R/15.1498s; Seq:454; Unb:85; Pro:404; Slow batches (>0.80s): 4/4
2025-10-29 16:32:33,713 readfish.targets 0236R/15.5160s; Avg: 0235R/15.2231s; Seq:456; Unb:115; Pro:608; Slow batches (>0.80s): 5/5
2025-10-29 16:32:47,399 readfish.targets 0228R/13.6795s; Avg: 0234R/14.9658s; Seq:462; Unb:141; Pro:804; Slow batches (>0.80s): 6/6
2025-10-29 16:33:02,270 readfish.targets 0187R/14.8696s; Avg: 0227R/14.9521s; Seq:462; Unb:179; Pro:953; Slow batches (>0.80s): 7/7
2025-10-29 16:33:20,471 readfish.targets 0290R/18.1985s; Avg: 0235R/15.3579s; Seq:466; Unb:230; Pro:1,188; Slow batches (>0.80s): 8/8
2025-10-29 16:33:36,124 readfish.targets 0259R/15.6499s; Avg: 0238R/15.3903s; Seq:469; Unb:273; Pro:1,401; Slow batches (>0.80s): 9/9
2025-10-29 16:33:53,906 readfish.targets 0248R/17.7796s; Avg: 0239R/15.6292s; Seq:472; Unb:324; Pro:1,595; Slow batches (>0.80s): 10/10
2025-10-29 16:34:11,717 readfish.targets 0237R/17.8087s; Avg: 0238R/15.8274s; Seq:477; Unb:376; Pro:1,775; Slow batches (>0.80s): 11/11
2025-10-29 16:34:26,587 readfish.targets 0307R/14.8674s; Avg: 0244R/15.7474s; Seq:481; Unb:437; Pro:2,017; Slow batches (>0.80s): 12/12
2025-10-29 16:34:41,612 readfish.targets 0219R/15.0224s; Avg: 0242R/15.6916s; Seq:483; Unb:480; Pro:2,191; Slow batches (>0.80s): 13/13
2025-10-29 16:34:56,617 readfish.targets 0221R/15.0028s; Avg: 0241R/15.6424s; Seq:483; Unb:524; Pro:2,368; Slow batches (>0.80s): 14/14
2025-10-29 16:35:12,502 readfish.targets 0256R/15.8825s; Avg: 0242R/15.6584s; Seq:485; Unb:548; Pro:2,598; Slow batches (>0.80s): 15/15
2025-10-29 16:35:28,760 readfish.targets 0227R/16.2562s; Avg: 0241R/15.6958s; Seq:486; Unb:590; Pro:2,782; Slow batches (>0.80s): 16/16
2025-10-29 16:35:42,821 readfish.targets 0215R/14.0592s; Avg: 0239R/15.5995s; Seq:488; Unb:631; Pro:2,954; Slow batches (>0.80s): 17/17
2025-10-29 16:35:57,341 readfish.targets 0192R/14.5182s; Avg: 0236R/15.5394s; Seq:490; Unb:663; Pro:3,112; Slow batches (>0.80s): 18/18
^C2025-10-29 16:36:01,571 readfish.targets Keyboard interrupt received, stopping readfish.
Any chance you can point me in the right direction to speed things up?
Edit: I have also tried running with a reference that only contains the sequence for chr20. The batches were faster (c. 9s) but still very slow.