A comprehensive SQL project demonstrating advanced SQL concepts including Joins, CTEs (Common Table Expressions), Window Functions, and Aggregations to answer key business questions about customers and orders.
Want to start immediately without setup? Use free public datasets!
-
Google BigQuery (Recommended - No setup!):
- Go to BigQuery Console
- Open
bigquery-public-data.thelook_ecommercedataset - Run queries from
05_bigquery_examples.sql - Free: 1 TB queries/month
-
Kaggle Datasets: Download e-commerce data and adapt queries (see
00_free_datasets_guide.md) -
Mode Analytics: Free SQL tutorial with pre-loaded datasets (see
00_free_datasets_guide.md)
Or use the included sample data - See "Getting Started" section below.
This project analyzes customer behavior, order patterns, and revenue trends using SQL. It includes sample data and queries that demonstrate various SQL techniques commonly used in data analysis.
00_free_datasets_guide.md- Guide to free datasets (Kaggle, Mode Analytics, BigQuery)01_schema.sql- Database schema creation (tables, indexes)02_sample_data.sql- Sample data insertion scripts03_analysis_queries.sql- SQL queries demonstrating core concepts (clean, commented)04_business_questions.sql- Business-focused analysis queries (clean, commented)05_bigquery_examples.sql- Queries adapted for Google BigQuery public datasetsREADME.md- This file
The project uses four main tables:
-
customers- Customer information- customer_id, first_name, last_name, email, registration_date, city, country
-
products- Product catalog- product_id, product_name, category, price, cost
-
orders- Order headers- order_id, customer_id, order_date, status
-
order_items- Order line items- order_item_id, order_id, product_id, quantity, unit_price
- INNER JOIN - Get orders with customer details
- LEFT JOIN - Include customers with no orders
- Multiple Joins - Combine data from multiple tables
- Simple CTEs for readability
- Multiple CTEs chained together
- CTEs with aggregations and calculations
- ROW_NUMBER() - Rank orders within customers
- RANK() & DENSE_RANK() - Product sales rankings
- LAG() & LEAD() - Compare values across rows
- PARTITION BY - Calculate averages within groups
- PERCENT_RANK() & CUME_DIST() - Distribution analysis
- Running totals - Cumulative calculations
- SUM, AVG, COUNT, MIN, MAX - Basic aggregations
- GROUP BY - Group data by categories
- HAVING - Filter aggregated results
- Conditional aggregations - CASE statements in aggregations
- Top 10 customers by total revenue
- Top customers by order frequency
- Top customers by average order value
- Monthly cohort retention rates
- Repeat customer rate analysis
- Time between orders (retention patterns)
- Monthly revenue trends
- Month-over-month growth rates
- Revenue by product category
- Cumulative revenue (YTD)
- Product performance analysis
- Customer acquisition analysis
- Profit margin calculations
- SQL database system (PostgreSQL, MySQL, SQL Server, SQLite, etc.)
- SQL client or command-line tool
-
Create the database schema:
-- Run 01_schema.sql to create tables -
Insert sample data:
-- Run 02_sample_data.sql to populate tables -
Run analysis queries:
-- Run 03_analysis_queries.sql for concept demonstrations -- Run 04_business_questions.sql for business insights
- Go to BigQuery Console
- Open
bigquery-public-data.thelook_ecommercedataset - Run queries from
05_bigquery_examples.sql - Free tier: 1 TB queries/month
- Download e-commerce datasets from Kaggle
- Import CSV files into your database
- Adapt queries from
03_analysis_queries.sqlto match your dataset
- Sign up for free Mode Analytics account
- Access pre-loaded tutorial datasets
- Adapt queries to Mode's dataset structure
See 00_free_datasets_guide.md for detailed instructions!
The SQL syntax is written for PostgreSQL. For other databases, you may need to adjust:
- MySQL/SQL Server: Replace
DATE_TRUNC('month', date)withDATE_FORMAT(date, '%Y-%m-01')orDATETRUNC(month, date) - SQL Server: Replace
DATEDIFF(day, date1, date2)withDATEDIFF(day, date1, date2)(same) - SQLite: Replace
DATE_TRUNCwithstrftime('%Y-%m', date) || '-01'andDATEDIFFwithjulianday(date2) - julianday(date1) - String concatenation: Replace
||withCONCAT()for MySQL
-- See Q1 in 04_business_questions.sql-- See Q7 in 04_business_questions.sql-- See Q4 in 04_business_questions.sqlAfter working through this project, you will understand:
- How to use different types of JOINs effectively
- When and how to use CTEs for complex queries
- How window functions can provide powerful analytical capabilities
- How to aggregate data for business insights
- How to answer common business questions with SQL
- The sample data includes 12 customers, 10 products, and 25 orders
- Data spans from January 2023 to July 2023
- All queries are designed to be educational and demonstrate best practices
✅ Multiple JOIN types demonstrated
✅ CTEs for complex query organization
✅ Comprehensive window function examples
✅ Various aggregation techniques
✅ Real-world business question solutions
✅ Clean, well-commented SQL queries (every query includes purpose and explanation)
✅ Free dataset integration (Kaggle, Mode Analytics, Google BigQuery)
✅ BigQuery-specific examples included