- Grocery Stores are a vital part of everyday life, providing us with foods and essentials as we need.
- People often uses grocery delivery applications to order products, making it easy to shop from home.
- Each transaction made through these applications are recorded in detail, creating a valuable dataset.
- This project looks at a same data of a supermarket transactions to understand how well it's performing.
- The dataset is sourced from Kaggle which simulates grocery sales in Tamil Nadu state of India.
- The dataset includes columns that provides detailed information about each transaction at the Supermarket.
- Link to the Dataset : Supermarket Sales Dataset
- To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.
Jupyter Notebook is required for this project and you can install and set it up in the terminal.
- Install the Notebook
pip install notebook
- Run the Notebook
jupyter notebook
NumPy
- Go to the terminal and run this code
pip install numpy
Pandas
- Go to the terminal and run this code
pip install pandas
Matplotlib
- Go to the terminal and run this code
pip install matplotlib
Seaborn
- Go to the terminal and run this code
pip install seaborn
- Clone this repository to your local machine by using the following command :
git clone https://github.com/themrityunjaypathak/Supermarket-Sales-Analysis.gitImporting Libraries
- Importing necessary libraries like numpy, pandas, matplotlib and seaborn.
Reading CSV File
- Reading CSV file by using pd.read_csv() method.
Overview of the Dataset
- Information about shape and size of the dataset.
- Types of column present in the dataset (numerical, categorical, text).
- Detailed info about the dataset using df.info() method.
Handling Null values in the Dataset
- This dataset does not contain any null values.
Unique values in aach Categorical Column
- Unique customer names in the data.
- Unique product categories in the data.
- Unique product sub-categories in the data.
- Unique cities in the data.
- Unique regions in the data.
Changing DataType of Columns
- Modifying the datatype of order_date column to pandas datetime format.
Utilizing existing information to create new Columns
- Extracting year, month and dates from order_date column.
- Extracting discount_amount from discount percent column by using mathematical formulas.
Statistical Analysis
- No. of products sold in each category.
- No. of products sold in each sub category.
- No. of products sold in each city.
- No. of products sold in each region.
- No. of products sold each year, month and date.
Data Visualization
- No. of products sold in each category.
- No. of products sold in each sub category.
- No. of products sold in each city.
- No. of products sold in each region.
- No. of products sold each year.
- No. of products sold each month.
- No. of products sold each date.
- Total sales in each category.
- Total sales in each sub category.
- Total sales in each region.
- Total sales in each city.
- Total sales in each month.
- Total sales in each year.
- Total profit in each category.
- Total profit in each sub category.
- Total profit in each region.
- Total profit in each city.
- Total profit in each month.
- Total profit in each year.
- Customers with highest amount of total sales.
- Customers with highest profit on their purchase.
- Total discount availed by customers.
Analyzed purchasing patterns of 9,000+ customers of a Supermarket.
- More than 15% of the products sold were Snacks.
- Shows that Snacks are a convenient choice and a major source of revenue.
- More than 32% of total sales came from the West region of the supermarket.
- Suggests that West region is a strong-performing area as compared to others.
- Health and Soft drinks were the most profitable sub-categories in beverages.
- Shows that both type of drink options perform well among customers.
- November was the most profitable month contributing about 15% of the total annual profits.
- Makes it an ideal time for running promotions and special offers.





















