rusty-dot

Fast dot plot comparisons of DNA sequences using an FM-Index. Written in Rust with PyO3 python bindings.

Features

Read FASTA / gzipped FASTA files via needletail
Build FM-indexes per sequence using rust-bio
K-mer set intersection for efficient shared k-mer lookup
Both-strand k-mer matching: forward (+) and reverse-complement (-) hits detected via compare_sequences_stranded
Merge sequential k-mer runs into contiguous match blocks for both orientations:
- Forward-strand co-linear diagonal merging (py_merge_kmer_runs)
- RC anti-diagonal merging — standard inverted repeats (py_merge_rev_runs)
- RC co-diagonal merging — both arms run in same direction (py_merge_rev_fwd_runs)
- Unified strand-aware entry-point (py_merge_runs)
PAF format output for alignment records
FM-index serialization/deserialization with serde + postcard
All-vs-all dotplot visualization with matplotlib:
- Forward hits drawn in blue (configurable via dot_color)
- Reverse-complement hits drawn in red (configurable via rc_color)
- Sequence names rendered once — at the bottom of each column and left of each row
- SVG vector output in addition to PNG/PDF via the format parameter
- Minimum alignment length filter (min_length) to suppress short/spurious hits before rendering
Cross-index comparisons between two sequence sets (e.g. two genome assemblies)
Relative sequence scaling in dotplot subpanels
Gravity-centre contig ordering for maximum collinearity
PafAlignment.filter_by_min_length() — discard short alignment records from a loaded PAF file
Full Python bindings via PyO3

Installation

Requirements:

Rust: See rust-lang.org
Python >=3.9 <3.14

# Clone this project repo
git clone https://github.com/Adamtaranto/rusty-dot.git && cd rusty-dot

# Install maturin build tool
pip install maturin

# Build and install the python package
maturin develop --release

Quick Start — single multi-FASTA index

Each sequence added to a SequenceIndex gets its own independent FM-index (rust-bio FM-indexes are read-only once built and cannot be extended).

Calling add_sequence or load_fasta multiple times accumulates sequences — it never merges or replaces the existing collection.

Re-using an existing sequence name emits a UserWarning and overwrites that entry.

If a FASTA file contains duplicate sequence names, load_fasta raises a ValueError before adding any sequences.

from rusty_dot import SequenceIndex
from rusty_dot.dotplot import DotPlotter

# Build an index from a multi-sequence FASTA file
# Each sequence in the file gets its own independent FM-index entry
idx = SequenceIndex(k=15)
names = idx.load_fasta("assembly.fasta")

# load_fasta accumulates: calling it again adds more sequences, keeps existing ones
# idx.load_fasta("more_sequences.fasta")   # would add to the same index

# List the sequences now held in the index
print(idx.sequence_names())   # ['contig1', 'contig2', 'contig3', ...]

# Print all pairwise PAF lines (every i ≠ j combination)
for line in idx.get_paf_all():
    print(line)

# Print PAF lines for one specific pair
for line in idx.get_paf("contig1", "contig2"):
    print(line)

# All-vs-all dotplot
# Forward (+) hits are drawn in blue, reverse-complement (-) hits in red.
# Sequence names appear once per column (bottom) and once per row (left).
plotter = DotPlotter(idx)
plotter.plot(output_path="all_vs_all.png", title="All vs All")

# Save as an SVG vector image instead of PNG
plotter.plot(output_path="all_vs_all.svg", title="All vs All")

# Filter out short alignments (< 500 bp) before plotting
plotter.plot(output_path="filtered.png", min_length=500)

# Single pairwise dotplot
plotter.plot_single("contig1", "contig2", output_path="pair.png")

All-vs-All Dotplot Between Two Genomes

Compare sequences from two separate FASTA files (e.g. two genome assemblies) and plot an all-vs-all grid with subpanels scaled by relative sequence length.

from rusty_dot.dotplot import DotPlotter
from rusty_dot.paf_io import CrossIndex, PafAlignment, PafRecord

# --- Build a cross-index for two assemblies ---
cross = CrossIndex(k=15)
cross.load_fasta("genome_a.fasta", group="a")   # query sequences (rows)
cross.load_fasta("genome_b.fasta", group="b")   # target sequences (columns)

# --- Sort contigs for maximum collinearity ---
# Option 1: via CrossIndex (delegates to SequenceIndex.optimal_contig_order)
q_sorted, t_sorted = cross.reorder_contigs()

# Option 2: via PafAlignment gravity-centre algorithm
# Retrieve all cross-group PAF lines
paf_lines = cross.get_paf_all()

records = [PafRecord.from_line(line) for line in paf_lines]
aln = PafAlignment.from_records(records)
q_sorted, t_sorted = aln.reorder_contigs(
    query_names=cross.query_names,
    target_names=cross.target_names,
)
# Unmatched contigs are placed at the end, sorted by descending length.

# --- Plot with relative scaling ---

plotter = DotPlotter(cross)
plotter.plot(
    query_names=q_sorted,
    target_names=t_sorted,
    output_path="cross_dotplot.png",
    scale_sequences=True,   # subplot size proportional to sequence length
    title="Genome A vs Genome B",
)

# Save as SVG vector image for publication-quality output
plotter.plot(
    query_names=q_sorted,
    target_names=t_sorted,
    output_path="cross_dotplot.svg",
    scale_sequences=True,
    title="Genome A vs Genome B",
)

# Suppress short alignments (e.g. < 500 bp) from the plot
plotter.plot(
    query_names=q_sorted,
    target_names=t_sorted,
    output_path="cross_dotplot_filtered.png",
    scale_sequences=True,
    min_length=500,
    title="Genome A vs Genome B (≥500 bp alignments)",
)

Filtering PAF Alignments by Length

Use PafAlignment.filter_by_min_length to remove short alignment records after loading a PAF file. This is particularly useful for cleaned-up visualisations when alignments have been merged from k-mer runs (which can be longer than the k-mer size) or when working with a pre-computed PAF file.

from rusty_dot.paf_io import PafAlignment

aln = PafAlignment.from_file("alignments.paf")

# Keep only alignments of at least 500 bp on the query
aln_long = aln.filter_by_min_length(500)
print(f"Records before: {len(aln)}, after: {len(aln_long)}")

Writing PAF Lines to a File

# All pairwise alignments within a single index
paf_lines = idx.get_paf_all()

# Or one specific pair
paf_lines = idx.get_paf("contig1", "contig2", merge=True)

with open("alignments.paf", "w") as f:
    for line in paf_lines:
        f.write(line + "\n")

Saving and Loading Indexes

# Save the current index to a compact binary file
idx.save("my_index.bin")

# Load into a new index (k must match the saved index)
idx2 = SequenceIndex(k=15)
idx2.load("my_index.bin")

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.cargo		.cargo
.github		.github
.vscode		.vscode
docs		docs
python/rusty_dot		python/rusty_dot
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rusty-dot

Features

Installation

Quick Start — single multi-FASTA index

All-vs-All Dotplot Between Two Genomes

Filtering PAF Alignments by Length

Writing PAF Lines to a File

Saving and Loading Indexes

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

License

Adamtaranto/rusty-dot

Folders and files

Latest commit

History

Repository files navigation

rusty-dot

Features

Installation

Quick Start — single multi-FASTA index

All-vs-All Dotplot Between Two Genomes

Filtering PAF Alignments by Length

Writing PAF Lines to a File

Saving and Loading Indexes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages