Skip to content

Using Principal Component Analysis (PCA) to understand human genetic variation based on data from the International Genome Sample Resource (IGSR)

Notifications You must be signed in to change notification settings

Ryan-Rong-24/pca-genomes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Principal Component Analysis (PCA) to understand human genetic variation based on data from the International Genome Sample Resource (IGSR)

This project reads in human genetic data from the IGSR in VCF format and performs PCA using scikit-learn and plots the results using altair, providing insights on human genetic variation based on geographic location

To download the data, run: download.sh

Then parse the data: vcf_to_matrix

Then finally run the python notebook plot_pca.ipynb

Results for Chromosome 22: res

About

Using Principal Component Analysis (PCA) to understand human genetic variation based on data from the International Genome Sample Resource (IGSR)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages