One Acre Fund predictive model

Prototype proof-of-concept model generating predicted probabilities that groups of farmers will default on the repayment deadline. The SQL SP release version is here.

Overview

A group-level default event is defined as any single client within a lending circle defaulting on their loan at the repayment deadline (OAF provides loans without repayment plans and with just a single repayment due date, so "days-past-due" etc. are not applicable here). If a single member of a group defaults, then her entire group (6-12 members) is black-listed for subsequent seasons. This has significant implications for client growth. The aim of this model is to identify groups at risk of default, and flag them for action from OAF loan officers and/or call centres.

The prototype model is currently in action across Kenya, Rwanda and Tanzania, representing >85% of our clients. Model scoring yields a cross-validation AUC score of 0.9 - 0.95 (depending on the country).

Breakdown of model

The model can be broken down into 3 (automated) stages, data cleaning, model building and finally predictions.

Data cleaning

Data are first cleaned and then summarized at the group level, this is also where new features are constructed (e.g. looking at group cohesion through spreads on repayment dates and amounts). At this point we check for multi-collinearity amongst our variables and drop any variables with high correlation:

Model building

With clean data we can start model building and testing. First there are options for recursive-feature extraction and model stability checks (which increase runtime). Once these checks are complete a small RF is grown and nodesize and mtry optimized for minimal OOB error (a mixed gradient descent and grid-search approach):

Once parameters are optimised on this smaller RF, a larger RF is grown with the aim of establishing an optimal cutoff point between classes:

The aim of this cutoff is to minimise false positives and maximise true positives:

However, given the importance of preventing default and subsequent black-listing, false negatives are penalized more heavily than false positives (the exact ratio of penalties depends on the historical default rates).

Finally, the training process generates informative plots such as the variable importance plot:

And some partial-dependency plots:

The last part of the process involves applying the model to cleaned data from the current year. These predictions are then shared with loan officers and call centre staff for follow up

Planned updates

Loan officer fraud detection
Individual level predictions
Hybrid approach combining predictions of final % repaid with default probability
Default hotspot mapping for decision making

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Data_cleaning.R		Data_cleaning.R
README.md		README.md
RUN_ME_ONLY.r		RUN_ME_ONLY.r
eval_performance.R		eval_performance.R
predict_default.R		predict_default.R
train_model.R		train_model.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One Acre Fund predictive model

Overview

Breakdown of model

Data cleaning

Model building

Planned updates

About

Uh oh!

Releases

Packages

Languages

Michael-Bar/Credit_scoring_farmers-prototype

Folders and files

Latest commit

History

Repository files navigation

One Acre Fund predictive model

Overview

Breakdown of model

Data cleaning

Model building

Planned updates

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages