Skip to content

Commit 5c8249c

Browse files
authored
Update README.md
1 parent 193dd6d commit 5c8249c

File tree

1 file changed

+0
-102
lines changed

1 file changed

+0
-102
lines changed

README.md

Lines changed: 0 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -1,111 +1,9 @@
11
# quickdna
22

3-
[![PyPI](https://img.shields.io/pypi/v/quickdna?style=flat-square)](https://pypi.org/project/quickdna/)
4-
53
Quickdna is a simple, fast library for working with DNA sequences. It is up to 100x faster than Biopython for some
64
translation tasks, in part because it uses a native Rust module (via PyO3) for the translation. However, it exposes
75
an easy-to-use, type-annotated API that should still feel familiar for Biopython users.
86

9-
*Quickdna is "pre-1.0" software. Its API is still evolving. For now, if you're interested in using quickdna, we suggest you depend on an [exact version](https://python-poetry.org/docs/dependency-specification/#exact-requirements) or [git `rev`](https://python-poetry.org/docs/dependency-specification/#git-dependencies), so that new releases don't break your code.*
10-
11-
```python
12-
# These are the two main library types. Unlike Biopython, DnaSequence and
13-
# ProteinSequence are distinct, though they share a common BaseSequence base class
14-
>>> from quickdna import DnaSequence, ProteinSequence
15-
16-
# Sequences can be constructed from strs or bytes, and are stored internally as
17-
# ascii-encoded bytes.
18-
>>> d = DnaSequence("taatcaagactattcaaccaa")
19-
20-
# Sequences can be sliced just like regular strings, and return new sequence instances.
21-
>>> d[3:9]
22-
DnaSequence(seq='tcaaga')
23-
24-
# many other Python operations are supported on sequences as well: len, iter,
25-
# ==, hash, concatenation with +, * a constant, etc. These operations are typed
26-
# when appropriate and will not allow you to concatenate a ProteinSequence to a
27-
# DnaSequence, for example
28-
29-
# DNA sequences can be easily translated to protein sequences with `translate()`.
30-
# If no table=... argument is given, NBCI table 1 will be used by default...
31-
>>> d.translate()
32-
ProteinSequence(seq='*SRLFNQ')
33-
34-
# ...but any of the NCBI tables can be specified. A ValueError will be thrown
35-
# for an invalid table.
36-
>>> d.translate(table=22)
37-
ProteinSequence(seq='**RLFNQ')
38-
39-
# This exists too! It's somewhat faster than Biopython, but not as dramatically as
40-
# `translate()`
41-
>>> d[3:9].reverse_complement()
42-
DnaSequence(seq='TCTTGA')
43-
44-
# This method will return a list of all (up to 6) possible translated reading frames:
45-
# (seq[:], seq[1:], seq[2:], seq.reverse_complement()[:], ...)
46-
>>> d.translate_all_frames()
47-
(ProteinSequence(seq='*SRLFNQ'), ProteinSequence(seq='NQDYST'),
48-
ProteinSequence(seq='IKTIQP'), ProteinSequence(seq='LVE*S*L'),
49-
ProteinSequence(seq='WLNSLD'), ProteinSequence(seq='G*IVLI'))
50-
51-
# translate_all_frames will return less than 6 frames for sequences of len < 5
52-
>>> len(DnaSequence("AAAA").translate_all_frames())
53-
4
54-
>>> len(DnaSequence("AA").translate_all_frames())
55-
0
56-
57-
# There is a similar method, `translate_self_frames`, that only returns the
58-
# (up to 3) translated frames for this direction, without the reverse complement
59-
60-
# The IUPAC ambiguity codes are supported as well.
61-
# Codons with N will translate to a specific amino acid if it is unambiguous,
62-
# such as GGN -> G, or the ambiguous amino acid code 'X' if there are multiple
63-
# possible translations.
64-
>>> DnaSequence("GGNATN").translate()
65-
ProteinSequence(seq='GX')
66-
67-
# The fine-grained ambiguity codes like "R = A or G" are accepted too, and
68-
# translation results are the same as Biopython. In the output, amino acid
69-
# ambiguity code 'B' means "either asparagine or aspartic acid" (N or D).
70-
>>> DnaSequence("RAT").translate()
71-
ProteinSequence(seq='B')
72-
73-
# To disallow ambiguity codes in translation, try: `.translate(strict=True)`
74-
```
75-
76-
## Benchmarks
77-
78-
For regular DNA translation tasks, quickdna is faster than Biopython. (See `benchmarks/bench.py` for source).
79-
Machines and workloads vary, however -- always benchmark!
80-
81-
task | time | comparison
82-
-------------------------------------------|------------------|-----------
83-
translate_quickdna(small_genome) | 0.00306ms / iter |
84-
translate_biopython(small_genome) | 0.05834ms / iter | 1908.90%
85-
translate_quickdna(covid_genome) | 0.02959ms / iter |
86-
translate_biopython(covid_genome) | 3.54413ms / iter | 11979.10%
87-
reverse_complement_quickdna(small_genome) | 0.00238ms / iter |
88-
reverse_complement_biopython(small_genome) | 0.00398ms / iter | 167.24%
89-
reverse_complement_quickdna(covid_genome) | 0.02409ms / iter |
90-
reverse_complement_biopython(covid_genome) | 0.02928ms / iter | 121.55%
91-
92-
## Should you use quickdna?
93-
94-
* Quickdna pros
95-
* It's quick!
96-
* It's simple and small.
97-
* It has type annotations, including a `py.typed` marker file for checkers like MyPy or VSCode's PyRight.
98-
* It makes a type distinction between DNA and protein sequences, preventing confusion.
99-
* Quickdna cons:
100-
* It's newer and less battle-tested than Biopython.
101-
* It's not yet 1.0 -- the API is liable to change in the future.
102-
* It doesn't support reading FASTA files or many of the other tasks Biopython can do,
103-
so you'll probably end up still using Biopython or something else to do those tasks.
104-
105-
## Installation
106-
107-
Quickdna has prebuilt wheels for Linux (manylinux2010), OSX, and Windows available [on PyPi](https://pypi.org/project/quickdna/).
108-
1097
## Development
1108

1119
Quickdna uses `PyO3` and `maturin` to build and upload the wheels, and `poetry` for handling dependencies. This is handled via

0 commit comments

Comments
 (0)