The project implements an ETL (Extract, Transform, Load) process to gather and analyze flight data from Expedia. Using Selenium for web scraping, the data is systematically extracted, organized, and stored in a structured CSV file format. The collected data is then transformed and loaded into an interactive Power BI dashboard, where it is visualized to uncover insightful trends and patterns in flight pricing.
The primary objective of this project is to provide actionable insights into flight prices by automating data collection and presenting it in an intuitive and interactive dashboard. This end-to-end solution highlights the complete workflow, from data extraction and transformation to visualization, demonstrating the integration of web scraping, data organization, and business intelligence tools.
- Web Scraping: Collects flight data (e.g., origin, destination, airline, price, stopovers) from Expedia.
- Data Storage: Saves scraped data into a structured CSV file.
- Data Visualization: Utilizes Power BI to create an interactive dashboard for analyzing flight prices.
Example of flight data, generated on csv file.
Price (MXN) | Flight Time | Stopover | Stopover Place | Airline | Departure Time | Date | Destination | Origin | Flight Type | Class |
---|---|---|---|---|---|---|---|---|---|---|
3378 | 1 h 47 min | Direct | - | Aeromexico (operated by Aerolitoral) | 16:33 | 18/01/2025 | Ciudad de México | Tepic | Day flight | Economic |
9592 | 10 h 41 min | 1 stop | 4 h 48 min in TIJ | Volaris | 20:36 | 18/01/2025 | Ciudad de México | Tepic | Night flight | Economic |
9592 | 9 h 34 min | 1 stop | 3 h 28 min in TIJ | Volaris | 20:36 | 18/01/2025 | Ciudad de México | Tepic | Night flight | Economic |
The dashboard presents the following insights:
- Lowest flight prices by day.
- Daily average lowest price by airline.
- Price trends for different destinations and airlines.
- Interactive filters for origin, destination, airline, and month.
- Python 3.7+
- Required Python libraries:
pandas
,selenium
,undetected-chromedriver
- Power BI Desktop for creating and modifying dashboards
- Clone the repository:
git clone https://github.com/danielsaed/Expedia_flight_prices_scraping.git cd Expedia_flight_prices_scraping
- Install dependencies:
pip install -r requirements.txt
- Run the scraper script:
python main.py
- The scraped data will be saved in a
flights_data.csv
file.
- Scrape flight data using the provided script.
- Open the Power BI file and load the updated CSV data.
- Interact with the dashboard to analyze flight prices.
- Data Collection: Scrape flight data using Python scripts.
- Data Storage: Save the collected data into a CSV file.
- Visualization: Use Power BI to create a dashboard for insights.
This project is licensed under the MIT License. See the LICENSE file for more details.
Special thanks to Expedia for the data and Power BI for enabling data visualization.