Skip to content

Latest commit

Β 

History

History
60 lines (45 loc) Β· 3.1 KB

File metadata and controls

60 lines (45 loc) Β· 3.1 KB

Burrows Wheeler Transform (BWT): DNA Alignment and Analysis - Carleton College CS Senior Comps 2024

🧬 BWT Project 🧬

Welcome to our project repository! This is the home of our innovative work on the Burrows-Wheeler Transform (BWT), a groundbreaking approach to DNA sequence alignment and analysis. Dive into our world of genomics, algorithms, and computational biology!

Team Extraordinaire πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

πŸš€ Project Components πŸš€

C Implementation πŸ–₯️

  • BWT Transform and Alignment: A robust C implementation complete with a sleek terminal-based user interface.
  • Other Alignment Algorithms πŸ”
  • See the README.txt in the C folder for instructions.

GUI Magic 🌈

  • Visualize BWT: Experience the BWT transformation through our dynamic graphical interface.
  • πŸ“ How to Use: See the README.txt in the GUI folder for instructions.

Website Showcase 🌐

  • HTML & CSS Genius: Explore our project's website, crafted with care and coding prowess.

Radix Sort Experiments πŸ§ͺ

  • Radix Sort Sandbox: Python and early C experimentation with the fascinating Radix Sort algorithm.

Benchmarking Tools ⏱️

  • Performance Analysis: Python programs meticulously designed for benchmarking our algorithms.

Python BWT Implementations 🐍

  • Classic BWT.py: The core Python implementation of BWT transform, reversal, and pattern matching.
  • BWT with a Twist: Our special Numpy-enhanced version of the BWT transform.
  • Python Power: Check out our Python implementations of the Boyer-Moore and Naive alignment algorithms

🧬 The World of DNA and BWT

  • Human DNA, a vast universe of over 3 billion characters (A, C, G, T), poses complex challenges in genome sequencing. Our project tackles these challenges head-on, exploring efficient alignment of short DNA sequences to a reference genome – a crucial task given the sheer volume and potential imperfections in the data.

Why BWT? πŸ€”

  • A Compression Powerhouse: BWT isn't just a compressive technique; it's a marvel in data storage and handling, especially for repetitive sequences like human DNA.
  • Versatility: From file compression (think bzip2) to DNA, BWT's adaptability is nothing short of amazing.

🌟 The Project Mission

  • Decode the BWT and FM-Index: Understand the why and how behind these compression superheroes.
  • Reference Genome Realities: Delve into the creation and ethical considerations of reference genomes.
  • Aligners in Action: Run and analyze outputs from existing BWT-based aligners, comparing their efficacy.
  • Hands-On Implementation: We're not just studying; we're building! From exact matches to handling mismatches and gaps, we're on it.
  • Data Playground: Engaging with real and simulated datasets to test and refine our alignment algorithms.

Running Locally

  1. Run our C implementation by doing
gcc bwt.c -o bwt
./bwt