This project focuses on analyzing a jewelry e-commerce dataset to extract meaningful insights about user purchases, product categories, and sales performance.
Python (Pandas) is used for data cleaning and preprocessing, while SQL queries handle complex data retrieval, aggregation, joining, and optimization.
- Cleaned and preprocessed raw sales data with Python to prepare it for database import
- Designed and created relational database tables (
users
,products
, andjewelry_sales
) - Performed advanced SQL queries including filtering, grouping, aggregation, and multiple types of JOINs
- Used subqueries and created SQL views to simplify and speed up analysis
- Optimized query performance by adding indexes on key columns
- Extracted key business insights such as total revenue, popular categories, and material-wise sales
Field Name | Description |
---|---|
event_time | Timestamp of the event |
user_id | Unique user identifier |
item_id | Unique item identifier |
quantity | Quantity of items in the event |
product_id | Product identifier |
category | Product category (e.g., Ring, Necklace) |
is_purchase | Flag indicating purchase (1 = purchase) |
price | Price per item |
session_id | User session identifier |
unknown_flag | Unknown data flag |
color | Item color |
material | Item material (e.g., Gold, Silver) |
gem | Gemstone type |
The dataset was cleaned using Pandas to ensure consistent column names and prepare for further SQL analysis.
import pandas as pd
df = pd.read_csv("jewelry.csv")
df.columns = [
"event_time", "user_id", "item_id", "quantity", "product_id",
"category", "is_purchase", "price", "session_id",
"unknown_flag", "color", "material", "gem"
]
df.to_csv("jewelry_cleaned.csv", index=False)
Python: Data cleaning and CSV manipulation using Pandas SQL: Complex query writing including JOINs, subqueries, grouping, aggregation Database Design: Creating normalized tables and views for modular querying Query Optimization: Using indexes to speed up database performance Data Analysis: Extracting actionable business insights from sales data Data Visualization: (Optional) Plotting charts with Matplotlib if applicable