-
Notifications
You must be signed in to change notification settings - Fork 31
Integrated oversampling
Sequence classification problems are ubiquitous and arise when the data exhibits a spatial-temporal structure. Examples include predicting traffic, earthquake prediction and even predicting the result from auctioning systems such as those in the financial markets. Recurrent Neural networks, such as Long Short-Term Memory (LSTM) networks are well suited to these types of problems. Oftentimes, however, the sequence is strongly imbalanced and the challenge is how to sample the training set while preserving the temporal structure. Integrated sampling provides a solution to this problem.
Hong Cao ; Xiao-Li Li ; David Yew-Kwong Woon ; See-Kiong Ng, "Integrated Oversampling for Imbalanced Time Series Classification", IEEE Transactions on Knowledge and Data Engineering ( Volume: 25, Issue: 12, Dec. 2013 )
The goal of this project will be to implement, assess and refine the method of integrated sampling. The technique shall be demonstrated with LSTMs applied to various imbalanced time series data sets including, traffic prediction and high frequency trading.
An integrated oversampling package will support the application of LSTMs and other RNNs to real world time series problems plagued by class imbalance.
Each project needs 2 mentors. One should be an expert R programmer with previous package development experience, and the other can be a domain expert in some other field or application area (optimization, bioinformatics, machine learning, data viz, etc). Ideally one of the two mentors should have previous experience with GSOC (either as a student or mentor).
Several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You’ll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.
Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data. Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities. Hard: Can the student write a package with Rd files, tests, and vigettes? If your package interfaces with non-R code, can the student write in that other language? Solutions of tests