@@ -442,17 +442,40 @@ and approaches for increasing the resilience and shareability of biological
442442sequencing data,
443443described in Chapter [ 5] ( #chp-decentralizing ) .
444444
445- <!--
446445## Methods
447446
448447### Implementation
449448
450- Focused on the user experience via the command-line interface and Python API,
451- it implemented the core data structures in C++ for efficiency and exposed it to
452- Python with an extension (written in Cython).
453- The Python API allows fast prototyping of new ideas and interoperability with
454- the larger scientific Python ecosystem,
455- as well as access to better tooling for testing and software distribution.
449+ ` sourmash ` is a software package implemented in Python for the command-line
450+ interface and API for data exploration,
451+ and Rust for the core data structures and performance improvements.
452+
453+ Both _ Scaled_ and regular _ MinHash_ sketches are available,
454+ calculated using the _ MurmurHash3_ hash function
455+ (lower 64-bits from the 128-bits version) with a $seed=42$
456+ and stored in a sorted vector in memory.
457+ Serialization and deserialization to JSON is implemented using the ` serde ` crate,
458+ and sketches also support abundance tracking for the hashes.
459+
460+ The _ LCA_ and _ MHBT_ indices are implemented at the Python level,
461+ and the _ MHBT_ supports multiple storage backends
462+ (hidden dir, Zip files, IPFS and Redis)
463+ depending on the use case requirements.
464+ The _ MHBT_ is implemented as a specialization of an _ SBT_ ,
465+ replacing the Bloom Filters in the leaf nodes from the latter with _ Scaled MinHash_
466+ sketches.
456467
457468### Experiments
458- -->
469+
470+ Experiments are implemented in ` snakemake ` workflows and use ` conda ` for
471+ managing dependencies,
472+ allowing reproducibility of the results with one command:
473+ ` snakemake --use-conda ` .
474+ This will download all data,
475+ install dependencies and generate the data used for analysis.
476+
477+ The analysis and figure generation code is contained in a Jupyter Notebook,
478+ and can be executed in any place where it is supported,
479+ including in a local installation or using Binder,
480+ a service that deploy a live Jupyter environment in cloud instances.
481+ Instructions are available at https://doi.org/10.5281/zenodo.4012667
0 commit comments