Skip to content

Project Work for Introduction to Bioinformatics Course | CSE 4893 United International University

License

Notifications You must be signed in to change notification settings

tinykishore/snoRNA-disease-associations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Unified Framework for Predicting snoRNA-Disease Associations through Linear Regression and Gradient Boosting

Ummay Maria Muna, Shanta Biswas, Riasat Azim

The intricate role of small nuclear RNAs (snoRNAs), a subset of small RNA molecules guiding chemical modifications in other RNAs, spans diverse biological processes. Dysfunctions in snoRNAs significantly contribute to the genesis and progression of complex diseases. However, conventional experimental methods are laborious and costly, impeding snoRNA-disease association identification. To address this, we propose a pioneering GBDT-LR model, merging gradient boosting decision trees (GBDT) with logistic regression (LR). Leveraging k-means clustering to screen negative samples, GBDT-LR extracts distinctive features via GBDT and subsequently feeds them into an LR model for association score prediction. This approach yields an impressive 93% accuracy and 88% ROC AUC, revolutionizing the identification of associations between non-coding RNAs and diseases. This computational strategy, integrating available data and tools, efficiently predicts unknown associations between diseases and snoRNAs. Leveraging machine learning techniques, particularly the adept GBDT model in feature extraction, followed by LR for association prediction, demonstrates significant potential in predicting complex disease associations with high accuracy.

About

Project Work for Introduction to Bioinformatics Course | CSE 4893 United International University

Topics

Resources

License

Stars

Watchers

Forks

Languages