-
Notifications
You must be signed in to change notification settings - Fork 31
Integrated oversampling
Sequence classification problems are ubiquitous and arise when the data exhibits a spatial-temporal structure. Examples include predicting traffic, earthquake prediction and even predicting the result from auctioning systems such as those in the financial markets. Recurrent Neural networks, such as Long Short-Term Memory (LSTM) networks are well suited to these types of problems. Oftentimes, however, the sequence is strongly imbalanced and the challenge is how to sample the training set while preserving the temporal structure. Integrated sampling provides a solution to this problem.
- Hong C., Xiao-Li L., Yew-Kwong W. and See-Kiong Ng, D. (2013) Integrated Oversampling for Imbalanced Time Series Classification, IEEE Transactions on Knowledge and Data Engineering, vol 25 (12).
- Liang G., Zhang C. (2012) A Comparative Study of Sampling Methods and Algorithms for Imbalanced Time Series Classification. In: Thielscher M., Zhang D. (eds) AI 2012: Advances in Artificial Intelligence. AI 2012. Lecture Notes in Computer Science, vol 7691. Springer, Berlin, Heidelberg
The goal of this project will be to implement, assess and refine the method of integrated sampling. The technique shall be demonstrated with LSTMs applied to various imbalanced time series data sets including, traffic prediction and high frequency trading.
An integrated oversampling package will support the application of LSTMs and other RNNs to real world time series problems plagued by class imbalance.
Please contact Matthew Dixon if you are a student interested in this project.
Applicants have to be able to show that they have:
-Ability to quickly identify and clearly communicate technical problems orally and in writing using R Markdown and latex.
-Mathematically orientated software engineering experience, preferably in industry, required.
-Ability to work to deadlines in a collaborative project with mentors and potentially other students.
-Solid background in statistics and computation including time series analysis, data structures, algorithms and text mining.
-Experience in applying machine learning and forecasting methods in R.
-Experience with Rcpp and statistical computing in C++.
-Must be able to develop software on windows and remote linux platforms using ssh and github.