You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a simple Python package with a single function `shuffle` that does a dinucleotide shuffle on input sequences.
3
+
This Python package provides a minimal and efficient implementation for performing dinucleotide shuffles on one-hot-encoded sequences.
4
4
5
-
A dinucleotide shuffle preserves the dinucleotide (doublet) frequencies of the input sequence while randomizing the order of the dinucleotides. This is useful for generating compositionally-matched random sequences.
5
+
Dinucleotide shuffling preserves the dinucleotide (nucleotide pair) frequencies of the input sequence while randomizing the order of the pairs. This is particularly useful for generating random sequences that match the compositional properties of the original input.
6
+
7
+
To ensure a uniform random sample from all possible shuffles, the algorithm leverages the rank-one-update Kirchhoff matrix method described by [Colburn et al.](https://doi.org/10.1006/jagm.1996.0014) for sampling random arborescences, combined with a random Eulerian walk on the dinucleotide transition graph. The core algorithm is implemented in Rust for performance, with Python bindings for easy integration.
8
+
9
+
This package is lightweight, requiring only a single dependency on Numpy.
</span><spanid="L-4"><ahref="#L-4"><spanclass="linenos"> 4</span></a><spanclass="sd">For installation and usage instructions, check out the [GitHub repository](https://github.com/austintwang/dinuc_shuf).</span>
</span><spanid="shuffle-15"><ahref="#shuffle-15"><spanclass="linenos">15</span></a><spanclass="sd"> A three-dimensional array of one-hot-encoded sequences with shape (num_seqs, seq_len, alphabet_size).</span>
87
+
</span><spanid="shuffle-15"><ahref="#shuffle-15"><spanclass="linenos">15</span></a><spanclass="sd"> A three-dimensional array of one-hot-encoded sequences with shape (num_seqs, seq_len, alphabet_size). Will be cast to np.uint8 if not already so.</span>
</span><spanid="shuffle-17"><ahref="#shuffle-17"><spanclass="linenos">17</span></a><spanclass="sd"> A NumPy random number generator instance. If None, a new default generator instance will be used.</span>
0 commit comments