BioInformatics_146

                                                          # QBS 146 Project Progress

Our QBS 146 project has reached its halfway point, and we are pleased to provide an update on the progress we have made so far. Our team, consisting of Bofan Chen, Caryn Butler, Mingliang Ge, Tanmay Shukla, and Xiaoqing Xia, has been working diligently to analyze the protein expression data and identify significant proteins in the Down Syndrome mice class.

Data Cleaning and Preprocessing: Tanmay Shukla has meticulously checked the dataset for any inconsistencies or missing values. He has successfully cleaned the data, ensuring its quality and reliability. Additionally, Tanmay has performed essential preprocessing steps such as scaling and normalization to prepare the data for analysis.
Exploratory Data Analysis and Visualization: Tanmay Shukla and Xia has taken the lead in visualizing the protein expression patterns using various data visualization techniques. These visualizations have provided valuable insights into the distribution and relationships among the protein expression levels. They have helped us gain a better understanding of the dataset.
Clustering Model Selection and Implementation: Allen Ge and Xiaoqing Xia have been responsible for exploring and applying various clustering models to identify significant proteins. They have successfully implemented K-Medoids, DBSCAN, Optics Clustering Algorithm, Random Forest, and Decision Tree models on the dataset. By experimenting with different algorithms, they have obtained preliminary results that will guide our further analyses.
Alternative Clustering Models: Bofan Chen and Caryn Butler have focused on exploring additional clustering models such as K-means Clustering, Agglomerative Hierarchical Clustering, and Fuzzy C-means Clustering. They have implemented these models to further analyze the data and extract meaningful insights. These alternative approaches will provide us with a comprehensive understanding of the protein expression patterns.
Implemented ANN-based clustering Models: Tanmay has further enhanced our analysis by developing and implementing several Artificial Neural Network (ANN)-based models for clustering the protein expression data. These ANN models leverage the power of neural networks to uncover intricate patterns and relationships within the dataset. By incorporating ANN-based models, we expect to gain additional insights and refine our understanding of the significant proteins in Down Syndrome mice.The ANN and DNN models have been specifically developed to perform clustering on the protein expression data. These models provide additional insights and contribute to a comprehensive analysis.
Automation and Fine Tuning

Created a pipeline for multi-model comparison: Tanmay has developed a pipeline to streamline the process of comparing multiple clustering models. This pipeline allows us to evaluate and compare the performance of different models efficiently and all the fine tuned clustering models, including K-Medoids, DBSCAN, Optics Clustering Algorithm, Random Forest, Decision Tree, Artificial Neural Network (ANN), and Deep Neural Network (DNN). Looking ahead to the remaining time, our plan is as follows:

Model Evaluation and Comparison: We will continue to evaluate and compare the performance of each clustering model, including the ANN and DNN models developed by Tanmay. We will assess their accuracy, precision, recall, F1-score, and other relevant metrics to identify the most effective model for identifying significant proteins.
Feature Selection and Significance Assessment: Once we have identified the best-performing model, we will conduct feature selection techniques to narrow down the set of significant proteins. This step will involve identifying the proteins that exhibit the most substantial differences between control and trisomy mice. By focusing on these proteins, we aim to gain insights into the molecular mechanisms underlying Down Syndrome.
Interpretation and Conclusion: We will interpret the results obtained from the selected clustering model and significant protein analysis. Our goal is to provide meaningful biological interpretations and draw conclusions about the protein expression patterns associated with Down Syndrome. We will relate our findings to existing literature and discuss the implications for potential drug treatments. As of now, we have not encountered any significant problems or obstacles. However, we anticipate potential challenges during the evaluation and comparison of clustering models. We will carefully address any issues that may arise and collaborate as a team to overcome them.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ANN_prediction model		ANN_prediction model
Alternative Clustering Models		Alternative Clustering Models
Clustering Model Selection and Implementation(Allen&Stacy)		Clustering Model Selection and Implementation(Allen&Stacy)
Data cleaning		Data cleaning
EDA_visualisation		EDA_visualisation
Data_Cortex_Nuclear.csv		Data_Cortex_Nuclear.csv
README.md		README.md
cb_clustering.ipynb		cb_clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioInformatics_146

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BioInformatics_146

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages