Comprehensive exploratory data analysis of 11,914 vehicles to identify key factors influencing car prices and market segmentation patterns. This project was completed as part of the Samsung Innovation Campus AI Track.
Key Findings:
- Identified strong correlation between engine performance (HP, cylinders) and price (r=0.7)
- Discovered bimodal market structure: mainstream vs. luxury segments
- Engineered new features and performed extensive data cleaning
- Built interactive Power BI dashboards for business intelligence
- Data Processing: Python, Pandas, NumPy
- Visualization: Power BI, Matplotlib, Seaborn
- Analysis: Statistical Analysis, Correlation Analysis, Feature Engineering
- Tools: Jupyter Notebook, Git, VS Code
- Market Segmentation: Clear separation between high-volume mainstream brands (Toyota, Ford) and low-volume luxury brands (Bugatti, Rolls-Royce)
- Price Drivers: Engine HP (0.7 correlation) and cylinder count (0.5 correlation) are strongest technical price predictors
- Data Distribution: Heavy right skew in prices - median ($30,680) more representative than mean ($41,930)
- Business Impact: Provided actionable recommendations for marketing segmentation and product strategy
- Clone repository:
git clone https://github.com/Shrouk-Sharaf/car-price-analysis.git
- Install dependencies:
pip install -r requirements.txt
- Run analysis:
jupyter notebook notebooks/car_data_analysis.ipynb
