Skip to content

Latest commit

 

History

History
105 lines (78 loc) · 3.68 KB

File metadata and controls

105 lines (78 loc) · 3.68 KB

Selective Sequencing Experiments

Setting up

  1. Install conda, if not already installed, from here.
  2. Create a conda environment and activate.
    conda env create -f env.yml
    conda activate selectiveseq
  3. Install Metagraph prerequisites from here.
  4. Clone repo and setup.
    git clone <experiments repo>
    cd NASExperiments
    chmod +x setup.sh
    ./setup.sh
  5. By default, ont-pybasecall-client-lib=7.6.8 and ont-pyguppy-client-lib=6.5.7 are installed in the environment. To check if they are compatible with the basecaller vesion, run /opt/ont/dorado/bin/dorado_basecall_server --version and, if required, download the appropriate client library from here if you have Dorado 7.3.0 onwards or here if you have Guppy or Dorado upto version 7.2.x.

Running experiments on basecalled reads

In the experiments folder, so the following:

  1. Create a script (say my_tasks.sh) as follows. Use the variables OUTDIR for output files and TMPDIR for temporary files. This script will be imported by another script (task_runner.sh) and these variables would be defined in it. The functions should be called fn (n = 1, 2, 3, ...) and define ALL_TASKS="1-n". Collinearity, Metagraph, Spumoni, and Rawhash executable paths are all exported in task_runner.sh.
ALL_TASKS="1-3"

f1() {
	# task 1
	Collinearity ...
}

f2() {
	# task 2
	metagraph ...
}

f3() {
	# task 3
	spumoni ...
}
  1. Run the tasks using task runner as follows:
# Run all tasks from 'my_tasks.sh'
./task_runner.sh -t my_tasks.sh

# Can also omit the .sh extension from task file
./task-runner.sh -t my_tasks

# Run tasks 1-2 from the 'my_tasks.sh' file
./task_runner.sh -t my_tasks 1-2

# Run ./task_runner.sh -h for other formats to specify task numbers

Logs

  • The stdout and stderr logs are written in a file {TASK_NAME}_MMMDD_HHMMSS.log.
  • The output directory variable OUTDIR points to NASExperiments/out
  • TMPDIR points to NASExperiments/tmp

Running experiments on raw signals using Minknow Simulator

Run the server -

mksimserver --certs /scratch/NASExperiments/code/MinknoApiSimulator/certs \
--input /data/SimulatedDatasets/Zymo/signals/Sigs0_450.blow5 \
--input /data/SimulatedDatasets/Zymo/signals/Sigs1_450.blow5

On a different shell, run Readfish (assuming the .toml file is validated using readfish validate)-

export MINKNOW_API_USE_LOCAL_TOKEN="no"
export MINKNOW_SIMULATOR="true"
export MINKNOW_TRUSTED_CA=/scratch/NASExperiments/code/MinknoApiSimulator/certs/server.pem
export MINKNOW_API_CLIENT_CERTIFICATE_CHAIN=/scratch/NASExperiments/code/MinknoApiSimulator/certs/client.pem
export MINKNOW_API_CLIENT_KEY=/scratch/NASExperiments/code/MinknoApiSimulator/certs/client.key
 
readfish targets --wait-for-ready 5 \
--toml /scratch/NASExperiments/configs/rf_mm_zymo.toml \
--port 50051 --device MN12345 \
--log-file /scratch/NASExperiments/logs/readfish_test.log \
--experiment-name 'test.log'

Or run the NASExperiments/scripts/simulate_run.sh script after editing the following lines:

INPUT_SIGNAL=(
	...
)
CONFIG_TOML=...

Logs

Readfish creates a test_run_readfish.tsv file in the directory where its invoked. It contains the decision taken for each read. Two additional files are created in NASExperiments/logs - readfish_MMMDD_hhmmss.log and server_MMMDD_hhmmss.log which contains the stdout and stderr logs for readfish and mksimserver respectively.