Olist

Olist is project we worked on as part of the course Business Intelligence and Big Data Analysis, which is taught by Damianos Chatziadoniou at Athens University of Economics and Business. The main objective was to construct a data warehouse and perform data mining tasks for a publicly available dataset.

Summary

In this project we created a data warehouse and did 2 data mining tasks on a sales dataset for Olist, which is a Brazilian E-commerce business. More specifically, for the data warehouse we implemented an ETL process, we modeled the data as a star schema and we created an OLAP server. Finally, we used analysis and data mining to extract insights from the data and visualised them with the help of Miscrosoft Power BI. For the data storage, ETL process, Olap server, analysis and visualization we used Microsoft tools and the data mining was done using Python.

About the company

The Olist store is an e-commerce business headquartered in Sao Paulo, Brazil. This firm acts as a single point of contact between various small businesses and the customers who wish to buy their products.

About the dataset

It is a public dataset and it has information of over 100k orders from 2016 to 2018 made by multiple marketplaces in Brazil. You can fing it in Kaggle. Brazilian E-Commerce Public Dataset by Olist | Kaggle

Extract-Transform-Load Process

Firstly, we created some staging tables in the SQL server we used for storage. We created an ETL process which extracts the data from the csv files, transforms them (e.g. aggregation, filtering) and loads them in the Data Warehouse(sql tables). For the ETL process we used the SSIS extention of Visual Studio.

Star Schema

We constructed the star schema that is shown below to model our data. Our star schema consists of 1 fact table and 6 dimensions. The dimensions are the following: customer dimension, product dimension, seller dimension, order status dimension, date dimension, order dimension and product dimension. Our fact table has a row for each item purchased, approximately 100.000 rows. It contains the following measures :freight value of each item, price of each item, score for each order( it is the same for all items of an order). It is an essential task in order to analyse large datasets (eg aggregations)fast with the help of olap servers.

OLAP server

We used the data from the data warehouse as they are modeled in the star schema in order to create a cube. Our multidimentional cube consists of 1 fact table and 6 dimensions. We created a hierarchy of the date attributes for the Date dimension. We also created 2 Calculated Members in the cube. We used the cube for our analysis of the dataset. The olap server was created with the SSAS extention of Visual Studio.

Data Mining Tasks

Given our dataset and the the business model of Olist we ran two DM models. We did a Customer Segmentation using clustering and the RFM(Recency-Frequency-Monetary) method. Also, we classified each seller as Silver, Gold, Platinum based on the number of products, customers and sales each had. To achieve that created a classsification model using the KNN classifier. For the data mining models we used Python (pandas, sklearn, matplotlib).

Reports

As a last step, we created reports in PowerBI to showcase the results of our analysis and datamining tasks. For example, the following report has the sales of year 2018 by month, day of the week and product category. We observed that the revenue for the last quarter of the year is the lowest.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
olist_ssas		olist_ssas
olist_ssis		olist_ssis
Classification_data.xlsx		Classification_data.xlsx
DATA_WAREHOUS_OLIST.pptx		DATA_WAREHOUS_OLIST.pptx
Dataset schema.png		Dataset schema.png
README.md		README.md
Star schema.png		Star schema.png
data_mining.ipynb		data_mining.ipynb
data_warehouse_OLIST.pdf		data_warehouse_OLIST.pdf
olist.xlsx		olist.xlsx
sales2018.png		sales2018.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Olist

Summary

About the company

About the dataset

Extract-Transform-Load Process

Star Schema

OLAP server

Data Mining Tasks

Reports

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Olist

Summary

About the company

About the dataset

Extract-Transform-Load Process

Star Schema

OLAP server

Data Mining Tasks

Reports

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages