Automobile insurance fraud detection Using different Machine learning models I'm trying to predict the fradulant claims. Since the target variable is unbalanced, I've used random over sampling method to balance the dataset.
The dataset contains 40 columns:
0 months_as_customer
1 age
2 policy_number
3 policy_bind_date
4 policy_state
5 policy_csl
6 policy_deductable
7 policy_annual_premium
8 umbrella_limit
9 insured_zip
10 insured_sex
11 insured_education_level
12 insured_occupation
13 insured_hobbies
14 insured_relationship
15 capital-gains
16 capital-loss
17 incident_date
18 incident_type
19 collision_type
20 incident_severity
21 authorities_contacted
22 incident_state
23 incident_city
24 incident_location
25 incident_hour_of_the_day
26 number_of_vehicles_involved
27 property_damage
28 bodily_injuries
29 witnesses
30 police_report_available
31 total_claim_amount
32 injury_claim
33 property_claim
34 vehicle_claim
35 auto_make
36 auto_model
37 auto_year
38 fraud_reported
39 _c39