-
Notifications
You must be signed in to change notification settings - Fork 8
xgboost loss functions
xgboost is an extremely fast R package for learning nonlinear machine learning models using gradient boosting algorithms. It supports various different kinds of outputs via the objective argument. However it is missing
- For (left, right, and interval) censored outputs, AFT (https://en.wikipedia.org/wiki/Accelerated_failure_time_model) losses (Gaussian, Logistic).
- For count data with an upper bound, Binomial loss = negative log likelihood of the https://en.wikipedia.org/wiki/Binomial_distribution
Other R packages such as gbm implement the Cox loss for boosting with censored output regression. However gbm supports neither AFT nor binomial losses.
Other R packages such as glmnet implement the binomial loss for regularized linear models. However it is a linear model so may not be as accurate as boosting for some applications/data sets.
Figure out a method for passing these outputs to xgboost. In both cases (binomial/censored) the outputs can be represented as a 2-column matrix. Typically in R the
- censored outputs would be specified via Surv(lower.limit, upper.limit, type=”interval2”)
- binomial/count outputs would be specified as in glmnet, “two-column matrix of counts or proportions (the second column is treated as the target class”
In xgboost, implement the binomial loss for count outputs, and the Gaussian/Logistic AFT losses for censored outputs.
Docs
Tests
Mentors, please explain how this project will produce a useful package for the R community.
Students, please contact mentors below after completing at least one of the tests below.
MENTORS: fill in this part. each project needs 2 mentors. One should be an expert R programmer with previous package development experience, and the other can be a domain expert in some other field or application area (optimization, bioinformatics, machine learning, data viz, etc). Ideally one of the two mentors should have previous experience with GSOC (either as a student or mentor). Please provide contact info for each mentor, along with qualifications. Example:
- Toby Hocking <[email protected]> is the author of R packages X and Y.
- Other Dev <[email protected]> is an expert in machine learning, and has previous GSOC experience with NAME_OF_OPEN_SOURCE_ORGANIZATION in 2015-2016.
Students, please do one or more of the following tests before contacting the mentors above.
MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You’ll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.
- Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
- Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
- Hard: Can the student write a package with Rd files, tests, and vigettes? If your package interfaces with non-R code, can the student write in that other language?
Students, please post a link to your test results here.