Skip to content

Group project for Data Science course at HU Berlin (WS 1617)

Notifications You must be signed in to change notification settings

phinguyen44/BADS_Project_WS1617

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

290 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BADS Project

bads-ws1718-group21_ created by GitHub Classroom

image

Members: Claudia Günther, Phi Nguyen, Julian Winkel

Project Details

Next Steps for BADS Group Assignment

A) Final Data Cleaning

  • remove inplausible values (e.g. age above/below certain value)
  • impute missing values (median, mean or via ML) -> see notes from DAII (Claudia) -> has also to be done for unknown data set
  • Consider EM Algorithm (Expectation Maximization) for the NAs

B) Feature Creation: Create additional, useful variables

  • Age
  • Delivery time
  • Not returned (NA for delivery date)
  • cluster: brands/ sizes? -> see lecture on feature creation

C) Exploratory Data analysis // parallel to B)

  • summary stats
  • find useful patterns in data
  • mean return based on criteria item (brand, size) customer (age, title, state) delivery (time, season)

D) Compare known (training) and unknown data set

  • similarities & differences: what does this mean for our prediction?
  • do data set belong to same population? See lecture notes on this => the answer is probably yes

E) Baseline Model creation

  • simple logistic regression (from individual assignment)
  • neural network (make sure to standardize variables)
  • Random forest (sensible to variables with too many levels) + bagging (or gradient boosted tree (usually slightly better than RF))
  • Variable selectiob: Mix of filter (remove very poor variables first) and wrappers (stepwise backward deletion)

F) Model evaluation & comparison

  • N-fold cross-validation -> calculate average AUC and plot
  • ROC

G) Create heterogenous ensemble model

  • average results from all models or buid log model on all models => check whether gives best results

H) Predictions

About

Group project for Data Science course at HU Berlin (WS 1617)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •