This project focuses on predicting the success of Google Play Store apps using machine learning techniques. The goal is to classify apps based on their features like category, ratings, size, type, price, content rating, etc., to determine their potential success.
- Source: Kaggle - Google Play Store Dataset
- Total entries: ~10,000 apps
- Features include:
- App name
- Category
- Rating
- Reviews
- Size
- Installs
- Type (Free/Paid)
- Price
- Content Rating
- Genres
- Last Updated
- Android Version
- Python 3.8+
- Libraries:
pandas
,numpy
matplotlib
,seaborn
,plotly
scikit-learn
(ML models, preprocessing)xgboost
(for gradient boosting)
FAMILY, GAME, TOOLS, BUSINESS are the most frequent app categories.
Most apps are rated for "Everyone", followed by "Teen" and "Mature 17+".
A large majority of apps are free. Paid apps are fewer and often in specialized categories.
Ratings are mostly concentrated between 4.0 and 4.7. There are a few outliers below 3.0.
Most free apps have a wide range of ratings. Paid apps tend to be rated slightly higher on average.
- Removed duplicate and null values
- Handled inconsistent formats (like '1,000+' → 1000)
- Converted categorical variables using Label Encoding and One-Hot Encoding
- Normalized/Standardized numeric columns
- Feature selection based on correlation and variance
Several classification models were trained to predict app success:
Model | Accuracy | F1 Score |
---|---|---|
Logistic Regression | 88% | 87% |
Random Forest | 90% | 89% |
XGBoost Classifier | 92% | 91% |
- The XGBoost model achieved the highest accuracy and F1 score.
- Accuracy: Percentage of correctly classified apps
- F1 Score: Harmonic mean of precision and recall
- Confusion Matrix: To evaluate false positives and false negatives
- Classification Report: Precision, recall, and F1-score per class
git clone https://github.com/your-username/google-playstore-app-prediction.git
cd google-playstore-app-prediction
pip install -r requirements.txt
jupyter notebook Google-play-store\ Prediction.ipynb
Make sure to download the dataset from Kaggle and place it in the appropriate directory.
- Deploy the model using Streamlit or Flask
- Add more app metadata (e.g., app permissions, developer info)
- Use deep learning models for regression-based install prediction
- Create a dashboard to visualize predictions live
This project is open-source and available under the MIT License.
- Kaggle for providing the dataset
- Scikit-learn and XGBoost for model training