Skip to content

Commit 81548a7

Browse files
authored
Release 1.3 (#3)
* # Changed mutational-context-range according to pore type and updated plots ## mutational context - r9 mut_context range is now 5 - r10 mut_context range is now 9 ## plots - *MeanDistAvgStdev* are now called *MeDAS* plots - added a MeDAS plot for excluding low coverage positions - added a MeDAS plot using sns.replot using different markers for positions with coverage below 10 and above * switched segmentation algorithm to f5c * Fixed range check * Update tests according to new r9 range * added --rna and --r10 * Fix hue in MeDAS coverage plot * Script to filter .magnipore for coverage * Add entry point of cov_filter.py * Added description to cov_filter * Make plots prettier * Update gitignore * Update READMEs and descriptions * Set default coverage threshold to 10 * Update plots * rename filter script * sort imports * Update * Reduce Runtime, Multiprocessing - Using multiprocessing for model comparisons - Remove pandas dataframe - Added plotting script after comparison * Update tests * Update meta data * Move progress print to subprocess to see real progress * Multiprocessing Model Building * Update tests according to code changes * Update tests * Add modules to namespace * Remove slower multiprocessing * Update gitignore * Add argument for the coverage * Remove comments * Remove namespace * Update tests * Fix bugs * remove logs from tests * Updated test * Update test * Make gzip overwrite already existing files * Remove duplicate command logs * Improve plots * Improve plots * Improve plots * Improve plots and add more description * Fix count for `Positions with no data X, ...` * Add magnicheck * Change argument order * Add total eval_pos to output * Take strand into account * Fix KeyError * Fix Bug * Fix output kmer value * Add output * Output more information * Fix Bug - multiprocessing.value counters are now incremented using a Lock * still trying to fix incrementation bug * Reduce plotting size * Bug fixes * Switched argument order in f5c index command * do not print warnings * sort imports * Fix usage message * Add --help messages for all scripts * Check tests
1 parent a9a19cf commit 81548a7

File tree

25 files changed

+4379
-4002
lines changed

25 files changed

+4379
-4002
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,3 +132,9 @@ pull_request.md
132132
zenodo/trick_zenodo.py
133133
readme.md
134134
.vscode/settings.json
135+
local_test.sh
136+
test.py
137+
test.txt
138+
magnipore/nanosherlock_mp.py
139+
magnipore/nanosherlock_old.py
140+
tests/segmentation/test_*/log.txt

README.md

Lines changed: 6 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -148,64 +148,11 @@ magnipore --basecalls_first_sample basecalls_first_sample --basecalls_sec_sample
148148

149149
Using the same reference sequence for both samples results in no reported mutations. Magnipore will only report potential modifications in this case. If you assume there are mutations between the samples, try to provide different reference sequences containing these mutations.
150150

151-
### Help Message
151+
### Help Messages
152152

153-
<details><summary>Click here to see help message:</summary>
153+
[Complete help messages can be found here!](help/help_messages.md)
154154

155-
```bash
156-
usage: Magnipore [-h] [--guppy_bin GUPPY_BIN] [--guppy_model GUPPY_MODEL] [--guppy_device GUPPY_DEVICE] [-b1 FASTQ] [-b2 FASTQ] [-s1 TXT] [-s2 TXT] [-d] [-t THREADS] [-fr]
157-
[-mx {map-ont,splice,ava-ont}] [-mk MINIMAP2K] [--timeit] [--rna] [-v]
158-
raw_data_first_sample reference_first_sample label_first_sample raw_data_sec_sample reference_sec_sample label_sec_sample working_dir
159-
160-
Required tools: see github https://github.com/JannesSP/magnipore
161-
162-
positional arguments:
163-
raw_data_first_sample
164-
Parent directory of FAST5 files of first sample, can also be a single SLOW5 or BLOW5 file of first sample, that contains all reads, if FASTQs are
165-
provided
166-
reference_first_sample
167-
reference FASTA file of first sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
168-
label_first_sample Name of the sample or pipeline run
169-
raw_data_sec_sample Parent directory of FAST5 files of second sample, can also be SLOW5 or BLOW5 file of second sample, that contains all reads, if FASTQs are provided
170-
reference_sec_sample reference FASTA file of second sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
171-
label_sec_sample Name of the sample or pipeline run
172-
working_dir Path to write all output files
173-
174-
optional arguments:
175-
-h, --help show this help message and exit
176-
--guppy_bin GUPPY_BIN
177-
Guppy binary (default: None)
178-
--guppy_model GUPPY_MODEL
179-
Guppy model used for basecalling (default: None)
180-
--guppy_device GUPPY_DEVICE
181-
Use the GPU to basecall "cuda:0" to use the GPU with ID 0 (default: cuda:0)
182-
-b1 FASTQ, --basecalls_first_sample FASTQ
183-
Path to existing basecalls of first sample. Basecalls must be in one single file. (default: None)
184-
-b2 FASTQ, --basecalls_sec_sample FASTQ
185-
Path to existing basecalls of second sample. Basecalls must be in one single file. (default: None)
186-
-s1 TXT, --sequencing_summary_first_sample TXT
187-
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of second sample. (default: None)
188-
-s2 TXT, --sequencing_summary_sec_sample TXT
189-
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of first sample. (default: None)
190-
-d, --calculate_data_density
191-
Will calculate data density after building the models. Will increase runtime! (default: False)
192-
-t THREADS, --threads THREADS
193-
Number of threads to use (default: 1)
194-
-fr, --force_rebuild Run commands regardless if files are already present (default: False)
195-
-mx {map-ont,splice,ava-ont}, --minimap2x {map-ont,splice,ava-ont}
196-
-x parameter for minimap2 (default: map-ont)
197-
-mk MINIMAP2K, --minimap2k MINIMAP2K
198-
-k parameter for minimap2 (default: 14)
199-
--timeit Measure and print time used by submodules (default: False)
200-
-rna Use when data is rna (default: False)
201-
-r10 Use when data is from R10.4.1 flowcell (default: False)
202-
-km KMER_MODEL, --kmer_model KMER_MODEL
203-
custom kmer model file for f5c eventalign (default: None)
204-
-v, --version show program's version number and exit
205-
```
206-
</details>
207-
208-
#### required arguments:
155+
#### required arguments for magnipore:
209156
use either the basecalling arguments or provide basecalls
210157
- basecalling arguments:
211158
- guppy_bin : Path to guppy binary
@@ -215,8 +162,6 @@ use either the basecalling arguments or provide basecalls
215162
- basecalls_first_sample : Path
216163
- basecalls_sec_sample : Path
217164

218-
For optional arguments see magnipore.py --help. Includes small number of mapping parameters and the option to skip basecalling.
219-
220165
## Output File Description
221166

222167
<details><summary>Click here to see overview:</summary>
@@ -252,6 +197,9 @@ same for second sample:
252197
- 13: Running nanosherlock of the first sample failed
253198
- 14: Running nanosherlock of the second sample failed
254199
- 15: Number of provided reference sequences is not equal 1 or 2
200+
- 16: Unknown pore type
201+
- 17: Error in multiprocessing signal comparison
202+
- 18: Error in magniplot
255203
---
256204
Errors of first sample:
257205
- 119: Cannot basecall .slow5/.blow5 with guppy

README.rst

Lines changed: 8 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -139,63 +139,13 @@ reported mutations. Magnipore will only report potential modifications
139139
in this case. If you assume there are mutations between the samples, try
140140
to provide different reference sequences containing these mutations.
141141

142-
Help Message
143-
------------
142+
Help Messages
143+
-------------
144144

145-
.. code:: bash
145+
`Complete help messages can be found here! <help/help_messages.md>`__
146146

147-
usage: Magnipore [-h] [--guppy_bin GUPPY_BIN] [--guppy_model GUPPY_MODEL] [--guppy_device GUPPY_DEVICE] [-b1 FASTQ] [-b2 FASTQ] [-s1 TXT] [-s2 TXT] [-d] [-t THREADS] [-fr]
148-
[-mx {map-ont,splice,ava-ont}] [-mk MINIMAP2K] [--timeit] [--rna] [-v]
149-
raw_data_first_sample reference_first_sample label_first_sample raw_data_sec_sample reference_sec_sample label_sec_sample working_dir
150-
151-
Required tools: see github https://github.com/JannesSP/magnipore
152-
153-
positional arguments:
154-
raw_data_first_sample
155-
Parent directory of FAST5 files of first sample, can also be a single SLOW5 or BLOW5 file of first sample, that contains all reads, if FASTQs are
156-
provided
157-
reference_first_sample
158-
reference FASTA file of first sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
159-
label_first_sample Name of the sample or pipeline run
160-
raw_data_sec_sample Parent directory of FAST5 files of second sample, can also be SLOW5 or BLOW5 file of second sample, that contains all reads, if FASTQs are provided
161-
reference_sec_sample reference FASTA file of second sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
162-
label_sec_sample Name of the sample or pipeline run
163-
working_dir Path to write all output files
164-
165-
optional arguments:
166-
-h, --help show this help message and exit
167-
--guppy_bin GUPPY_BIN
168-
Guppy binary (default: None)
169-
--guppy_model GUPPY_MODEL
170-
Guppy model used for basecalling (default: None)
171-
--guppy_device GUPPY_DEVICE
172-
Use the GPU to basecall "cuda:0" to use the GPU with ID 0 (default: cuda:0)
173-
-b1 FASTQ, --basecalls_first_sample FASTQ
174-
Path to existing basecalls of first sample. Basecalls must be in one single file. (default: None)
175-
-b2 FASTQ, --basecalls_sec_sample FASTQ
176-
Path to existing basecalls of second sample. Basecalls must be in one single file. (default: None)
177-
-s1 TXT, --sequencing_summary_first_sample TXT
178-
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of second sample. (default: None)
179-
-s2 TXT, --sequencing_summary_sec_sample TXT
180-
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of first sample. (default: None)
181-
-d, --calculate_data_density
182-
Will calculate data density after building the models. Will increase runtime! (default: False)
183-
-t THREADS, --threads THREADS
184-
Number of threads to use (default: 1)
185-
-fr, --force_rebuild Run commands regardless if files are already present (default: False)
186-
-mx {map-ont,splice,ava-ont}, --minimap2x {map-ont,splice,ava-ont}
187-
-x parameter for minimap2 (default: map-ont)
188-
-mk MINIMAP2K, --minimap2k MINIMAP2K
189-
-k parameter for minimap2 (default: 14)
190-
--timeit Measure and print time used by submodules (default: False)
191-
-rna Use when data is rna (default: False)
192-
-r10 Use when data is from R10.4.1 flowcell (default: False)
193-
-km KMER_MODEL, --kmer_model KMER_MODEL
194-
custom kmer model file for f5c eventalign (default: None)
195-
-v, --version show program's version number and exit
196-
197-
required arguments:
198-
~~~~~~~~~~~~~~~~~~~
147+
required arguments for magnipore:
148+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
199149

200150
use either the basecalling arguments or provide basecalls
201151

@@ -207,9 +157,6 @@ use either the basecalling arguments or provide basecalls
207157
- basecalls_first_sample : Path
208158
- basecalls_sec_sample : Path
209159

210-
For optional arguments see magnipore.py –help. Includes small number of
211-
mapping parameters and the option to skip basecalling.
212-
213160
Output File Description
214161
=======================
215162

@@ -252,6 +199,9 @@ Error Codes Explanation
252199
- 13: Running nanosherlock of the first sample failed
253200
- 14: Running nanosherlock of the second sample failed
254201
- 15: Number of provided reference sequences is not equal 1 or 2
202+
- 16: Unknown pore type
203+
- 17: Error in multiprocessing signal comparison
204+
- 18: Error in magniplot
255205

256206
Errors of first sample:
257207

conda.recipe/meta.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ test:
3333
commands:
3434
- magnipore --help
3535
- nanosherlock --help
36+
- magnifilter --help
37+
- magniplot --help
38+
- magnicheck --help
3639
- pytest -vv
3740

3841
about:
@@ -274,6 +277,9 @@ about:
274277
- 13: Running nanosherlock of the first sample failed
275278
- 14: Running nanosherlock of the second sample failed
276279
- 15: Number of provided reference sequences is not equal 1 or 2
280+
- 16: Unknown pore type
281+
- 17: Error in multiprocessing signal comparison
282+
- 18: Error in magniplot with error code
277283
278284
---
279285
Errors of first sample:

0 commit comments

Comments
 (0)