From 710833e494e8c4c59e9acc230095a60436c9d139 Mon Sep 17 00:00:00 2001 From: psanyalaich Date: Wed, 23 Oct 2024 16:50:10 +0530 Subject: [PATCH 1/2] Create README --- .../Book Recommendation System/README.md | 33 +++++++++++++ .../Restaurant Recommender/README.md | 49 +++++++++++++++++++ 2 files changed, 82 insertions(+) create mode 100644 Recommendation Models/Book Recommendation System/README.md create mode 100644 Recommendation Models/Restaurant Recommender/README.md diff --git a/Recommendation Models/Book Recommendation System/README.md b/Recommendation Models/Book Recommendation System/README.md new file mode 100644 index 000000000..a1466eba4 --- /dev/null +++ b/Recommendation Models/Book Recommendation System/README.md @@ -0,0 +1,33 @@ +# Book Recommendation System + +## Introduction +The Book Recommendation System Project aims to understand the distribution of book ratings and publication years, and recommend well rated books. + +## Prerequisites +- Python 3.x +- NumPy +- Pandas +- Matplotlib +- Seaborn + +To install: ``pip install numpy pandas matplotlib seaborn`` + +## Methodology +1. **Data Loading, Exploration, and Cleaning**: The datasets are loaded and then inspected to understand their structure, including the shape, column names, and basic statistics. They are then cleaned to make it understandable. +2. **Univariate Analysis**: Univariate analysis was conducted to examine the distribution of ratings and publication years. +3. **Summary Statistics**: Summary statistics was generated for the year of publication. +4. **Handling Outliers and Anomalies**: Identified and handled any anomalies in the publication year. +5. **Boxplot Visualization**: A boxplot was generated to visualize the distribution of publication years. +6. **Filtering Data**: Books with publication years beyond 2022 were filtered out from the dataset. + +## Results +1. **Rating Distribution**: The distribution of ratings indicated user preferences and the popularity of books. +2. **Publication Years**: The analysis of publication years showed trends in book releases, with certain years having a higher frequency of publications. +3. **Data Quality**: Identification of missing values and duplicates helped to maintain data integrity for further analysis. +4. **Visualizations** + - **Count Plot**: A count plot of book ratings showcased user rating patterns. + - **Bar Chart**: A bar chart of books by year of publication highlighted trends in literature over the decades. + +## Conclusion +- This project provides valuable insights into the dynamics of book ratings and publication trends. +- By utilizing data visualization and analysis techniques, we can identify user preferences, detect anomalies in the dataset, and ensure data quality for future analyses. diff --git a/Recommendation Models/Restaurant Recommender/README.md b/Recommendation Models/Restaurant Recommender/README.md new file mode 100644 index 000000000..8aace7036 --- /dev/null +++ b/Recommendation Models/Restaurant Recommender/README.md @@ -0,0 +1,49 @@ +# Restaurant Recommender + +## Introduction +- This project analyzes restaurant data from Zomato for restaurants located in Lucknow, India. +- The dataset includes various attributes such as restaurant name, address, locality, cuisine, average cost, highlights, and customer ratings. +- The goal of this project is to explore the dataset, perform analysis, and identify the top 20 restaurants based on aggregate ratings. +- The analysis aims to provide insights into the best restaurants in the area based on customer feedback. + +## Prerequisites +- Python 3.x +- Jupyter Notebook / Google Colab +- pandas +- numpy +- geopandas +- matplotlib +- seaborn + +To install: `pip install pandas numpy geopandas matplotlib seaborn` + +## Methodology +1. **Exploratory Data Analysis**: Key characteristics of the data were examined, including checking for missing or duplicate entries. Summary statistics were used to understand variables like `average_cost_for_one`, `votes`, and `aggregate_rating`. +2. **Top 20 Restaurants**: The dataset was sorted by `aggregate_rating` to identify the top-rated restaurants. The top 20 were filtered and analyzed further to understand common patterns in ratings, cuisines, and locations. +3. **Data Visualization**: Various visualizations were created to illustrate relationships between key variables such as *restaurant rating*, *number of votes*, and *average cost*. +4. **Recommendation System Evaluation**: + - **Precision**: Precision is a measure of the accuracy of the recommendations. It tells you what proportion of the recommended items were relevant to the user. In your case, a precision of 0.4 means that 40% of the recommended restaurants were relevant to the user. + - **Recall**: Recall measures the coverage of the relevant items in the recommendations. It indicates what proportion of the relevant items were successfully recommended. A recall of approximately 0.67 means that 67% of the relevant restaurants were included in the recommendations. + - These values are typically between 0 and 1, with higher values indicating better performance. So, a precision of 0.4 and a recall of 0.67 suggest that the recommendations are somewhat accurate and cover a significant portion of the relevant items, but there is room for improvement. + - **Error Metrics**: + - **Mean Squared Error (MSE)**: Measures the average of the squared differences between the actual and predicted values. A lower MSE indicates better model performance. + - **Mean Absolute Error (MAE)**: Represents the average magnitude of the prediction errors, with smaller values preferred. + - **Root Mean Squared Error (RMSE)**: Used to measure the standard deviation of the prediction errors; lower values are better. +5. **Clustering Analysis**: The "Elbow Method" is a technique to determine the optimal number of clusters for a K-Means clustering algorithm. It looks for an "elbow" point in the plot where the distortion starts to decrease at a slower rate. The number of clusters corresponding to this point is considered optimal for clustering your data. The code helps you visualize this concept by plotting distortions for different values of k. + +## Results +- **Top-rated Restaurants**: + - Barbeque Nation (Rating: 4.9) + - Pirates of Grill (Rating: 4.8) + - Farzi Café (Rating: 4.7) +- **Popular Cuisines**: + - North Indian + - Mughlai + - Continental + - Modern Indian +- **Location Insights**: Restaurants in high-demand localities such as *Gomti Nagar* and *Chowk* have higher ratings and more customer votes. + +## Conclusion +- This project provides an in-depth analysis of restaurant data from Zomato, specifically focusing on customer preferences and ratings in Lucknow. +- The top restaurants are distinguished by their cuisines and prime locations, offering valuable insights for food enthusiasts and restaurant owners. +- Additionally, the evaluation metrics provide a framework for understanding the effectiveness of the recommendation system and areas for future improvement. From bda744a50df976ee0d8724a4d238f6b9533b4446 Mon Sep 17 00:00:00 2001 From: psanyalaich Date: Fri, 25 Oct 2024 20:06:22 +0530 Subject: [PATCH 2/2] Create README.md --- .../Stress Level Detection/README.md | 41 +++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 Detection Models/Stress Level Detection/README.md diff --git a/Detection Models/Stress Level Detection/README.md b/Detection Models/Stress Level Detection/README.md new file mode 100644 index 000000000..02d6749a2 --- /dev/null +++ b/Detection Models/Stress Level Detection/README.md @@ -0,0 +1,41 @@ +# Stress Level Detection +- The Stress Level Detection project aims to predict stress levels based on various physiological and demographic features using machine learning algorithms. +- The dataset used in this project contains information on individuals, including their age, heart rate, sleep hours, and gender. +- The goal is to classify individuals into different stress levels using models such as Logistic Regression, Random Forest, and Support Vector Machines (SVM). + +## Prerequisites +- Python +- Pandas +- NumPy +- Seaborn +- Matplotlib +- Scikit-learn +- Imbalanced-learn + +To install: `pip install pandas numpy seaborn matplotlib scikit-learn imbalanced-learn` + +## Dataset +The dataset used for this project is a CSV file named `stress_data.csv`, which includes the following columns: +- `Gender`: Gender of the individual (categorical) +- `Age`: Age of the individual (numerical) +- `HeartRate`: Heart rate of the individual (numerical) +- `SleepHours`: Number of hours the individual sleeps (numerical) +- `StressLevel`: Level of stress (categorical, target variable) + +# Usage +- Mount your Google Drive to access the dataset. +- Load the dataset using Pandas. +- Perform data cleaning, including handling missing values. +- Encode categorical variables and normalize numerical features. +- Split the data into training, validation, and test sets. +- Conduct exploratory data analysis (EDA) to visualize data distributions and correlations. +- Train models using Logistic Regression, Random Forest, and SVM. +- Evaluate the models using classification reports and accuracy scores. +- Use SMOTE to address class imbalance and re-evaluate the models. + +# Results +- Logistic Regression, Random Forest, and SVM models were trained and evaluated. +- SMOTE was applied to balance the dataset, resulting in improved accuracy for the SVM model. + +# Conclusion +This project demonstrates the process of detecting stress levels using machine learning techniques. \ No newline at end of file