This project analyzes Delhivery's logistics delivery dataset to understand delivery performance, route efficiency, and operational patterns using data analytics techniques.
The analysis focuses on transforming raw segment-level logistics data into meaningful trip-level insights that can help improve delivery efficiency and route planning.
The objective of this project is to analyze delivery operations and identify factors affecting delivery time using statistical analysis and data visualization.
Key goals include:
- Cleaning and preprocessing logistics data
- Aggregating segment-level data into trip-level insights
- Performing exploratory data analysis (EDA)
- Detecting and handling outliers
- Engineering meaningful features
- Performing hypothesis testing
- Extracting business insights and recommendations
The dataset contains detailed logistics delivery information including:
- Trip creation timestamps
- Source and destination centers
- Actual delivery time
- OSRM estimated time and distance
- Segment-level travel metrics
The dataset covers deliveries between 12 September 2018 and 8 October 2018.
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- SciPy
- Converted time columns to datetime format
- Removed missing values
- Verified data consistency
Segment-level data was aggregated into:
- Segment-level summaries
- Trip-level summaries
New features created:
- Trip duration
- Month, day, weekday, hour
- Source city and state
- Destination city and state
- Average delivery speed
- Time prediction error
Performed:
- Univariate analysis
- Bivariate analysis
- Multivariate analysis
Key visualizations included:
- Delivery time distribution
- Distance distribution
- Route type comparison
- Correlation heatmaps
Outliers were identified using boxplots and treated using the IQR method.
- Categorical variables encoded using One-Hot Encoding
- Numerical variables normalized using MinMaxScaler
Statistical hypothesis testing was conducted to validate relationships between aggregated delivery metrics.
Tests performed:
- Actual delivery time vs OSRM estimated time
- Actual time vs segment-level actual time
- OSRM distance vs segment OSRM distance
- OSRM time vs segment OSRM time
Paired t-tests were used for statistical validation.
- Actual delivery times are significantly higher than OSRM estimated times.
- Delivery time strongly correlates with delivery distance.
- Major logistics hubs include Bengaluru, Mumbai, and Gurgaon.
- The highest delivery activity occurs in Maharashtra, Karnataka, and Haryana.
- Operational delays may occur between delivery segments due to hub processing or logistics operations.
- Improve route prediction models by incorporating traffic and operational delays.
- Optimize high-volume logistics corridors.
- Improve processing efficiency at intermediate logistics hubs.
- Focus operational improvements on high-volume cities such as Bengaluru, Mumbai, and Gurgaon.