Skip to content

dkn22/RecessionClassifier

Repository files navigation

LDA

An implementation of latent Dirichlet allocation with variational inference, hyperparameter optimization, selection of K through cross-validation. The algorithm was used to predict US recessions from monetary policy texts. My research shows the algorithm is predictive up to a quarter and accuracy can be as high as 90% or more, if we augment topic features with macroeconomic indices.

Thesis can be found here.

  • preprocess.py: a Parser class to pre-process text
    • minutes_ngram_map.py: a dictionary of mappings from n-grams to unigrams for common economic phrases
  • topicmodel.py: an LDA class for mean-field variational inference
  • classifier.py: classes for discriminative classifier (via sklearn Logistic Regression) and generative classifier (based on LDA topic model from topicmodel.py)
  • evaluation.py: various utility functions and functions to compute/evaluate models based on the area-under-the-curve (AUC) and associated asymptotically normal hypothesis testing
  • plotter.py: functions to visualize classification performance of a model (ROC curve, confusion matrix, etc.)

The data used to produce visualizations and classification output can be found in train_df.csv and test_df.csv.

The stored latent variables can be found in the "Stored Latents" folder; these were used to obtain the exact results reported in the Jupyter notebooks.

About

Topic modelling for predicting US recessions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors