Developed a Convolutional Neural Network (CNN)-based AI model to classify DNA sequences as a proof of concept, demonstrating that optimization can significantly accelerate genetic analysis, improve diagnostics, and enable personalized treatment.
Collaborated with a PhD geneticist (University of the Andes), a Master in Applied Mathematics (National University of Colombia); encoded genetic data using One-Hot Encoding and Pandas, optimized training with PyTorch DataLoaders, and evaluated performance.
Implemented AI for genetic analysis using PyTorch (machine learning framework). This project excels in analyzing DNA sequences and classifying them based on discernible motifs.
- Python 3
- Jupyter Notebook (recommended for running in Google Colab)
- Clone the repository to your local machine:
git clone https://github.com/felipe-jimenez-ai/ai-genomics.git cd ai-genomics
This repository contains code for a genomics project utilizing artificial intelligence for the classification of DNA sequences. The code includes the following components:
- Gene sequence data is extracted from a CSV file using Pandas.
- The code provides functionality for generating simulated DNA sequences, but it is not used in the main code.
- Sequence labels are encoded using scikit-learn's LabelEncoder.
- DNA sequences are cleaned and converted to one-hot encoding using PyTorch.
- The data is split into training, validation, and test sets for model training and evaluation.
- PyTorch DataLoaders are prepared for efficient batch processing during training.
- A Convolutional Neural Network (CNN) is defined for classifying DNA sequences.
- Functions for training and validation loops are defined.
- The trained model is evaluated on a test set, and performance metrics are displayed.
- Matplotlib is used to plot training and validation loss curves.
- An example DNA sequence is provided, and the trained model predicts its class.
Feel free to explore the code and adapt it to your genomics classification tasks. If you have any questions or suggestions, please open an issue.
Note: The code assumes the availability of PyTorch, scikit-learn, pandas, and matplotlib libraries. Make sure to install these dependencies before running the code.