This project explores the Auto MPG dataset from the UCI Machine Learning Repository, which contains specifications and fuel efficiency data for cars manufactured between 1970–1982.
The goal is to:
- Understand trends in fuel efficiency across years, origins, and engine types
- Uncover relationships between MPG, engine displacement, horsepower, weight, and acceleration
- Identify patterns that explain how automotive engineering and market trends evolved during the dataset period
This notebook demonstrates data cleaning, visual exploration, and insight generation — essential skills for any data analyst or data scientist.
The dataset contains 398 entries with the following attributes:
| Column | Description | 
|---|---|
| mpg | Miles per gallon (fuel efficiency) | 
| cylinders | Number of engine cylinders | 
| displacement | Engine displacement (cubic inches) | 
| horsepower | Engine horsepower | 
| weight | Vehicle weight (lbs) | 
| acceleration | Time to accelerate from 0–60 mph (seconds) | 
| model year | Year of manufacture | 
| origin | 1 = American, 2 = European, 3 = Japanese | 
| car name | Model name of the car | 
- 
Data Loading & Inspection - Checked data types, null values, and initial structure
 
- 
Data Cleaning - Replaced missing horsepowervalues (originally '?') with median values
- Detected and treated outliers using the IQR method
 
- Replaced missing 
- 
Exploratory Data Analysis - Univariate analysis: Histograms, KDE plots for each numeric feature
- Categorical analysis: Count plots for cylinders,origin, andmodel year
- Bivariate analysis: Scatter plots and correlations to explore feature relationships
 
- 
Feature Scaling - Applied MinMax scaling for normalized visualizations (only for comparison, not used in core insights)
 
- 
Fuel Efficiency Trends: Fuel efficiency improved in later years, influenced by the 1973 oil crisis and changing regulations. 
- 
Engine Size vs MPG: Cars with higher displacement and horsepower tend to have lower MPG. 
- 
Vehicle Origins: - American cars dominate the dataset but tend to be less fuel-efficient
- Japanese cars consistently show higher MPG
- European cars occupy a middle ground
 
- 
Acceleration: Most vehicles have moderate acceleration times (14–16 seconds), reflecting balanced engineering priorities of the era. 
Some of the key plots include:
- MPG Distribution — skewed right, indicating fewer highly fuel-efficient cars
- Displacement Distribution — smaller engines more common, but still relatively large by modern standards
- Horsepower Distribution — most cars between 70–110 HP, with a secondary performance peak
- Origin Counts — custom-colored (Red = USA, Blue = Europe, Yellow = Japan)
- Correlation Heatmap — shows strong negative correlation between weight and MPG
- Develop a predictive regression model for MPG
- Explore clustering to group vehicles by performance and efficiency
- Compare trends to modern datasets for historical context
- Python 3
- Pandas, NumPy
- Matplotlib, Seaborn
- Jupyter Notebook
Author: Srijan Sen 📧 [email protected] | 🔗 LinkedIn | 💻 GitHub
If you want, I can also add inline preview images of your plots into this README so it visually pops on GitHub. Would you like me to do that?