A curated portfolio showcasing my journey in artificial intelligence and machine learning. This repository contains a collection of projects demonstrating my skills in areas like natural language processing, computer vision, and data analysis.
- Added SQL Student Mental Health Analysis, demonstrating comprehensive data analysis using PostgreSQL to examine international students' mental health indicators. Based on DataCamp dataset.
- Implemented advanced SQL queries analyzing relationships between language proficiency, length of stay, and mental health outcomes.
- Added the
003_llmsproject, demonstrating how to fine-tune a Llama 2 model using Parameter-Efficient Fine-Tuning (PEFT) with LoRA. - Developed scripts for training, evaluation, and comparison of the fine-tuned model against its base version.
- Authored a comprehensive guide (
llm_finetuning_guide.md) on the step-by-step process of fine-tuning LLMs.
This project demonstrates the process of fine-tuning a Llama 2 language model using Parameter-Efficient Fine-Tuning (PEFT), specifically with Low-Rank Adaptation (LoRA). It includes scripts for fine-tuning, evaluation, and comparison, along with a detailed guide on the methodology.
Key Activities:
- Fine-tuning a
meta-llama/Llama-2-7b-hfmodel on a custom JSON dataset. - Utilizing LoRA for efficient training by adding adapter layers instead of training the full model.
- Scripts to evaluate the fine-tuned model's performance and compare its outputs against the base model.
- A comprehensive guide (
llm_finetuning_guide.md) explaining the concepts from data preparation to model inference.
Technologies Used: Python, PyTorch, Hugging Face (transformers, peft, trl, datasets)
This project focuses on building a deep learning model to detect signs of tuberculosis in chest X-ray images. It serves as a practical application of computer vision for a real-world medical problem.
Key Activities:
- Baseline Model: A Convolutional Neural Network (CNN) was built from scratch to establish a performance baseline.
- Comparative Modeling: Systematically tested and evaluated multiple architectures (Custom CNN, MobileNetV2, ResNet50) and techniques (Transfer Learning, Fine-Tuning) to diagnose performance issues.
- Data Preprocessing: Implemented a robust data pipeline using
ImageDataGeneratorfor normalization and data augmentation. - Comprehensive Evaluation: Analyzed model performance not just with accuracy, but with professional diagnostic metrics like Sensitivity, Specificity, PPV, NPV, and the ROC/AUC score.
Technologies Used: Python, TensorFlow, Keras, NumPy, Matplotlib, Scikit-learn
This project involves building a prototype of an auto-complete system using N-gram language models. It's an assignment from Coursera's "Natural Language Processing with Probabilistic Models" course.
Key Features:
- Text preprocessing and tokenization.
- N-gram counting and probability estimation.
- Perplexity calculation for model evaluation.
Technologies Used: Python, NLTK, Pandas, NumPy
An exploratory data analysis (EDA) of four datasets: roller coasters, Netflix movies, FRED economic data, and student mental health data. This project involves data cleaning, preparation, visualization, and SQL analysis to uncover insights.
Key Projects:
- Student Mental Health Analysis (SQL): A comprehensive analysis using PostgreSQL to examine the relationship between international students' length of stay and mental health indicators (depression, social connectedness, acculturative stress). Based on DataCamp dataset.
- FRED Economic Data Analysis: Analysis of economic indicators from the Federal Reserve database.
- Netflix Movie Analysis: Exploration of Netflix content library characteristics.
- Roller Coaster Analysis: Statistical analysis of roller coaster features and trends.
Key Findings (Student Mental Health):
- Language proficiency significantly impacts depression levels among international students.
- Length of stay shows varying correlations with different mental health metrics.
- Academic level and age groups display distinct mental health patterns.
- International students face unique challenges compared to domestic students.
Key Findings (FRED Economic Data):
- The S&P 500 index shows significant growth over time, with notable fluctuations during economic events.
- The national unemployment rate exhibits cyclical patterns, with sharp increases during recessions.
- State-level unemployment data reveals regional disparities in economic performance, particularly during the COVID-19 pandemic.
Technologies Used: Python, Pandas, Matplotlib, Seaborn, fredapi, Plotly, PostgreSQL, SQLAlchemy
002_NLP: Natural Language Processing Projects (In Progress)
This folder contains various Natural Language Processing (NLP) projects, including a voice assistant and a sentiment analysis model.
Key Projects:
- IMDb Sentiment Analysis with TensorFlow: A Jupyter Notebook demonstrating sentiment analysis on movie reviews, covering data preprocessing, model building, training, and evaluation.
- Python Voice Assistant: A simple voice-controlled assistant capable of recognizing voice commands and performing basic tasks.
Technologies Used: Python, TensorFlow, Keras, NumPy, Pandas, Matplotlib, Seaborn, SpeechRecognition, pyttsx3, PyAudio
To run these projects, clone the repository and install the required dependencies for each project as listed in their respective README files.
git clone https://github.com/your-username/AI-Learning-Journey.git
cd AI-Learning-JourneyFor any questions or collaborations, please feel free to reach out.