Skip to content

Commit 5e57447

Browse files
committed
update docs
1 parent a7144ef commit 5e57447

File tree

4 files changed

+37
-6
lines changed

4 files changed

+37
-6
lines changed

docs/archive-lossy.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# A Guide for Archiving Lossy Data
22

3-
Lossy compression of raw nanopore signal data can be a great way to save disk space without significantly impacting basecalling and modification calling accuracy. This makes it particularly suitable for archiving, especially if you are running short of available disk space. Naturally, one may be concerned that this conversion would significantly deteriorate the quality of their data. To remedy such concerns, this guide outlines a number of sanity checks which when successful give confidence in the lossy conversion.
3+
Lossy compression of raw nanopore signal data can be a great way to save disk space without significantly impacting basecalling and modification calling accuracy. This makes it particularly suitable for archiving, especially if you are running short of available disk space. For instance, we were running out of disk space in our in-house storage systems at Garvan Institute long read sequencing service. So we lossy compressed all our historical datasets, the original BLOW5 files that consumed X TB, reduced to Y after compressed to BLOW5s with lossy compression.
4+
5+
Naturally, one may be concerned that this conversion would significantly deteriorate the quality of their data. Extensive benchmark results on the negligible impact of lossy compression strategy is presented in our [Genome Research publication](https://genome.cshlp.org/content/35/7/1574). To further remedy any concerns, this guide outlines a number of sanity checks which when successful give confidence in the lossy conversion.
46

57
## The Conversion
68

@@ -247,4 +249,25 @@ Then proceed as normal, using
247249
```bash
248250
SLOW5_FILE=SLOW5_SUB
249251
SLOW5_LOSSY_FILE=SLOW5_LOSSY_SUB
250-
```
252+
```
253+
254+
---
255+
256+
## Citation
257+
258+
259+
> Jayasooriya K, Jenner SP, Marasinghe P, Senanayake U, Saadat H, Taubman D, Ragel R, Gamaarachchi H, Deveson IW. A new compression strategy to reduce the size of nanopore sequencing data. Genome Research. 2025 Jul 1;35(7):1574-82. [http://doi.org/10.1101/gr.280090.124](http://doi.org/10.1101/gr.280090.124)
260+
261+
```
262+
@article{jayasooriya2025new,
263+
title={A new compression strategy to reduce the size of nanopore sequencing data},
264+
author={Jayasooriya, Kavindu and Jenner, Sasha P and Marasinghe, Pasindu and Senanayake, Udith and Saadat, Hassaan and Taubman, David and Ragel, Roshan and Gamaarachchi, Hasindu and Deveson, Ira W},
265+
journal={Genome Research},
266+
volume={35},
267+
number={7},
268+
pages={1574--1582},
269+
year={2025},
270+
publisher={Cold Spring Harbor Lab}
271+
}
272+
```
273+

docs/archive.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,4 +222,9 @@ compare_basecalls (){
222222
}
223223
```
224224

225-
However, note that sometimes this test diff will cause false errors due to base-callers providing slightly different outputs in various circumstances (see https://github.com/hasindu2008/slow5tools/issues/70). We recently came through a situation where Guppy 4.4.1 on a system with multiple GPUs (GeForce 3090 and 3070) produced slightly different results, even on the same FAST5 input when run multiple times.
225+
However, note that sometimes this test diff will cause false errors due to base-callers providing slightly different outputs in various circumstances (see https://github.com/hasindu2008/slow5tools/issues/70). We recently came through a situation where Guppy 4.4.1 on a system with multiple GPUs (GeForce 3090 and 3070) produced slightly different results, even on the same FAST5 input when run multiple times.
226+
227+
228+
---
229+
230+
As of 2025, now you can further reduce the size of BLOW5 by 30-40% by using lossy compression which has neglagle impact on basecalling/modcalling accuracy. Please refer to the page ["A Guide for Archiving Lossy Data"](archive-lossy.md) for more information.

docs/bits-lossy.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
# Bits eliminated in lossy compressions
1+
# Bits eliminated in lossy compression
2+
3+
For reducing the size of BLOW5 files, we can perform non-reversible lossy compression using `slow5tools degrade`. More details about this compression strategy is available in our [Genome Research publication](https://genome.cshlp.org/content/35/7/1574).
24

35
The number of bits eliminated when using `slow5tools degrade` with the default `-b auto` option is documented here.
4-
slow5tools version column indicates from which version the profile is available. `.` means not yet available.
6+
7+
In the table below, slow5tools version column indicates from which version the profile is available. `.` means not yet available.
58

69
| Experiment type | Sequencing Kit | Device | Sample frequency | slow5tools version | Bits eliminated |
710
| --------------- | ---------------- | ------------------- | ---------------- | ------------------ | --------------- |

docs/commands.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ See below for documentation on `degrade`-specific options. For documentation on
295295
* `-s, --sig-compress compression_type`:<br/>
296296
Specifies the raw signal compression method used for BLOW5 output. Note: the default value is ex-zd which differs in `view`.
297297
* `-b, --bits INT`:<br/>
298-
The number of least significant bits to zero then round for each raw signal data point [default value: "auto" (autodetected based on the file header and data)].
298+
The number of least significant bits to eliminate by rounding for each raw signal data point [default value: "auto" (autodetected based on the file header and data)]. The auto detected number of bits are as documented [here](bits-lossy.md).
299299

300300

301301
## GLOBAL OPTIONS

0 commit comments

Comments
 (0)