This repository contains a personal project focused on building an end-to-end data pipeline for used car market in Morocco. The pipeline automates data collection, cleaning, transformation, storage in an AWS RDS PostgreSQL database, and visualization through Grafana dashboard.
Important
Feel free to submit a PR request if you believe any changes are necessary...
The data scraping phase is managed by scripts in avito/ and moteur/
- Web Scraping: Used Python's BeautifulSoup to extract data from the two used car marketplaces.
- Attributes Collected: Scraped key attributes such as price, car_model, car_company, year, km, and more...
The data cleaning and transformation phase is handled by functions in db/
- Data Cleaning: Remove null car_companies, normalize car_company and car_model, and delete inappropriate data using SQL.
- Transformation: Structure data for relational storage and ensures compatibility with PostgreSQL schema.
The data storage phase is executed by insert script in db/
- AWS RDS Integration: Load cleaned and structured data into an AWS RDS PostgreSQL database.
The data visualization phase is implemented through grafana/
- Grafana Integration: Configure interactive dashboard to visualize market trends, pricing... for every car_company.
- Grafana Cloud: Leverage cloud-hosted service for dynamic, real-time data exploration.
- Clone the repository:
git clone https://github.com/L1xus/CarsDash.git
cd CarsDash
- Run docker-compose
docker-compose up --build
- Create the dashboard
cd grafana
docker-compose up --build
- Import the Cars Dashboard json file into grafana
- Add support for filtering by car_model and year.
- Automate the process every month.
- Enhance data quality by creating a machine learning model to filter car listings.
- Consider a better option to make the dashboard public.