Analyze traffic accident data to identify patterns related to road conditions, weather, and time of day. Visualize accident hotspots and contributing factors.
The US Accidents dataset from Kaggle containing millions of accident records across 49 states was used. Due to the massive dataset size (29.86 GB), a random sample of 200,000 records was analyzed for efficiency. The analysis followed these steps:
- Data Loading & Sampling - Loaded 200,000 random accident records for efficient processing
- Time Feature Extraction - Extracted hour, day, month, and day of week from timestamps
- Pattern Analysis - Identified trends across different time periods and conditions
- Geographic Analysis - Determined accident hotspots by city and state
- Weather Impact Assessment - Analyzed weather conditions, temperature, and visibility during accidents
- Road Features Analysis - Examined the presence of traffic signals, junctions, crossings, etc.
- Visualization - Created 13 comprehensive visualizations showing patterns and contributing factors
Python libraries including pandas, matplotlib, seaborn, and numpy were used for data processing, analysis, and visualization.
Kaggle - US Accidents Dataset (2016-2023)
- URL: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents
- Original Size: ~7.7 million records (29.86 GB)
- Sample Used: 200,000 records
- Time Period: February 2016 - March 2023
- Coverage: 49 US states
- Attributes: 47 features including location, time, weather conditions, road features, and severity
- Peak Accident Hours: 7-9 AM and 4-6 PM (rush hours)
- Most Dangerous Day: Friday shows highest accident frequency
- Seasonal Pattern: Winter months (Dec-Feb) have more accidents
- Top Accident States: California, Texas, and Florida lead in accident counts
- Weather Impact: Most accidents occur in fair/clear weather (due to higher traffic volume)
- Visibility: Lower visibility significantly increases accident severity
- Road Features: Traffic signals and junctions are common accident locations
- Time of Day: Afternoon (12-5 PM) has the highest accident rate
- Severity Distribution: Majority of accidents are Severity 2 and 3
The analysis includes 13 visualizations:
- Accidents by hour of day (line graph)
- Accidents by day of week (bar chart)
- Accidents by month (bar chart)
- Time of day distribution (bar chart)
- Heatmap showing hour vs day of week patterns
- Top 10 cities with most accidents
- Top 10 states with most accidents
- Accident severity distribution
- Weather conditions during accidents
- Temperature distribution
- Visibility distribution
- Road features present (traffic signals, junctions, etc.)
- Severity by time of day (stacked bar)
- Python 3.8+
- Pandas - Data manipulation and sampling
- Matplotlib - Data visualization
- Seaborn - Statistical visualizations and heatmaps
- NumPy - Numerical operations
- Jupyter Notebook - Interactive analysis
| Metric | Value | Description |
|---|---|---|
| Records Analyzed | 200,000 | Sample from full dataset |
| Date Range | 2016-2023 | 7+ years of data |
| States Covered | 49 | Nearly all US states |
| Peak Hour | 5:00 PM | Evening rush hour |
| Most Dangerous Day | Friday | Highest accident frequency |
| Top State | California | Most accidents recorded |
| Average Severity | ~2.3 | On scale of 1-4 |
| Common Severity | 2 | Most frequent level |
| Weather Conditions | 10+ types | Including clear, rain, snow, fog |
| Road Features | 13 types | Traffic signals, junctions, etc. |
- Download the dataset from Kaggle
- Upload
US_Accidents_March23.csvto Jupyter Notebook - Run each cell sequentially
- Wait 2-3 minutes for initial data loading
- View all visualizations and insights
Notebook: accident_analysis.ipynb
Morning (7-9 AM) and evening (4-6 PM) rush hours show significant spikes in accidents, suggesting traffic volume as a major contributing factor.
Weekdays show higher accident rates than weekends, with Friday being the peak day, likely due to end-of-week fatigue and higher traffic.
Winter months experience more accidents, possibly due to adverse weather conditions and reduced visibility.
Accidents are concentrated in highly populated states and urban areas with dense traffic networks.
Surprisingly, most accidents occur during fair weather, indicating that traffic volume matters more than weather conditions alone.
Internship Task 4 - Data Science Internship