Skip to content

Seed Aligner is a light-weight tool that detects a common seed region across genetic sequences and reorders them to start at the same point. It standardizes FASTA inputs for Covary without performing full multiple sequence alignment.

Notifications You must be signed in to change notification settings

mahvin92/Seed-Aligner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Seed Aligner

Seed Aligner is a lightweight genetic preprocessing module designed to solve a fundamental issue in comparative genomic analysis:
not all sequence assemblies in databases start at the same genomic region.

This variation, where some sequences starting internally, others at the end, or beginning can disrupt alignment and embedding analyses such as those performed by Covary. Seed Aligner addresses this by locating conserved β€œseed” consensus regions across sequences and normalizing sequence orientations.


🌱 Overview

Unlike conventional Multiple Sequence Alignment (MSA) tools that align entire sequece, Seed Aligner focuses on identifying a short, conserved seed region and reassembling the sequences around it.

Key Features

  • πŸ” Seed Region Identification: Finds a consensus sequence (dafault= 100 nt; start of the reference) shared across all genomes.
  • πŸ”„ Sequence Reorientation: Repositions fragments flanking the seed to ensure all sequences start consistently.
  • 🧩 MSA-Free Normalization: Reduces computational cost by skipping full alignments.
  • ☁️ Colab Compatible: Runs entirely on Google Colab as a Jupyter Notebook for fast prototyping.

βš™οΈ Workflow

  1. Input: Multi-FASTA file containing complete genomes.
  2. Seed Detection: Paste the reference sequence or assembly.
  3. Sequence Rearrangement:
    • If the genome starts after the seed β†’ shift 5β€² fragment to the end.
    • If the genome starts before the seed β†’ ensure seed alignment consistency.
  4. Output: Normalized FASTA file suitable for Covary input and other FASTA-associated analyses

🧩 Example

Reference assembly: [SEED] ... AGTCC ... TTGAC

Changes Example (reorientation)
Original sequence TTGAC... [SEED] ...AGTCC
Normalized output [SEED] ...AGTCC...TTGAC

The sequence will now start uniformly at the seed region like the reference assembly.


πŸš€ Run in Google Colab

You can open the notebook directly in Google Colab:

About

Seed Aligner is a light-weight tool that detects a common seed region across genetic sequences and reorders them to start at the same point. It standardizes FASTA inputs for Covary without performing full multiple sequence alignment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published