Skip to content
This repository was archived by the owner on Jul 29, 2019. It is now read-only.

Michael-Bar/Credit_scoring_farmers-prototype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One Acre Fund predictive model

Prototype proof-of-concept model generating predicted probabilities that groups of farmers will default on the repayment deadline. The SQL SP release version is here.

Overview

A group-level default event is defined as any single client within a lending circle defaulting on their loan at the repayment deadline (OAF provides loans without repayment plans and with just a single repayment due date, so "days-past-due" etc. are not applicable here). If a single member of a group defaults, then her entire group (6-12 members) is black-listed for subsequent seasons. This has significant implications for client growth. The aim of this model is to identify groups at risk of default, and flag them for action from OAF loan officers and/or call centres.

The prototype model is currently in action across Kenya, Rwanda and Tanzania, representing >85% of our clients. Model scoring yields a cross-validation AUC score of 0.9 - 0.95 (depending on the country).

Breakdown of model

The model can be broken down into 3 (automated) stages, data cleaning, model building and finally predictions.

Data cleaning

Data are first cleaned and then summarized at the group level, this is also where new features are constructed (e.g. looking at group cohesion through spreads on repayment dates and amounts). At this point we check for multi-collinearity amongst our variables and drop any variables with high correlation:

Model building

With clean data we can start model building and testing. First there are options for recursive-feature extraction and model stability checks (which increase runtime). Once these checks are complete a small RF is grown and nodesize and mtry optimized for minimal OOB error (a mixed gradient descent and grid-search approach):

Model tuning

Once parameters are optimised on this smaller RF, a larger RF is grown with the aim of establishing an optimal cutoff point between classes:

cutoff

The aim of this cutoff is to minimise false positives and maximise true positives:

cutoff

However, given the importance of preventing default and subsequent black-listing, false negatives are penalized more heavily than false positives (the exact ratio of penalties depends on the historical default rates).

Finally, the training process generates informative plots such as the variable importance plot:

And some partial-dependency plots:

The last part of the process involves applying the model to cleaned data from the current year. These predictions are then shared with loan officers and call centre staff for follow up

Planned updates

  • Loan officer fraud detection
  • Individual level predictions
  • Hybrid approach combining predictions of final % repaid with default probability
  • Default hotspot mapping for decision making

About

Pre-SQL integration version

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages