Reducing Entry Into Homelessness through ML-Informed Rental Assistance

Repository for a project to prioritize individuals currently facing eviction for rental assistance based on their predicted risk of future homelessness. This is an ongoing collaboration between Carnegie Mellon University (CMU) and the Allegheny County Department of Human Services (ACDHS). This work began as a project for the Data Science for Social Good Summer Fellowship 2022 at CMU in Pittsburgh (the final code from the DSSG fellowship is available here).

Data

We use data from ACDHS' rich data warehouse that includes individual-level information on previous evictions, homeless spells, interactions with mental, behavioral, and physical health services, address changes, demographic information, as well as enrollment in a variety of other ACDHS and state programs. We refresh the data from the warehouse on a weekly basis. This data is not publicly available; researchers interested in collaborating with ACDHS should contact them directly.

Formulation of the Modeling Problem

Using the historical data described above, we trained a series of models that predict an individual's risk of experiencing homelessness within 12 months of the prediction, among all those with a recent eviction filing. Using these predictions, every week we provide ACDHS with a list of individuals who are at highest risk of experiencing homelessness in the near future, among those currently facing eviction.

Cohort

The cohort is individuals in Allegheny County

who have had an eviction filing or a disposition in the four months prior to the prediction date
are not currently homeless

We define "currently homeless" individuals to be anyone who has interacted with homelessness services (as defined below) in the four months prior to the prediction date.

Outcome Label

Individuals belong to the class of interest (label=1) if they interact with homelessness services at least once within 12 months of the prediction date. The relevant homelessness services include emergency shelters, street outreach programs, transitional housing, and other re-housing programs. For each weekly list, we predict the top k individuals (k is based on intervention capacity) in terms of their risk of interacting with homelessness services within 12 months.

Alternate formulations

We use the above cohort and label definitions in our experiments, but alternate formulations we have explored are documented in pipeline\configs\README.md.

Methodology

Define cohort based on problem formulation
Define outcome label based on formulation
Define training and validation sets over time
Define and generate predictors
Train Models on each training set and score all individuals in the corresponding validation set
Evaluate all models for each validation time according to each metric (PPV at top k)
Select "Best" model based on results over time
Explore the high performing models to understand who they rank high, how they compare to the cohort, and important predictors
Check and/or correct for bias issues

Repository Structure

To build our predictive models, we use Triage, an open-sourced ML pipeline tool built and maintained by our team. Triage enables the modeling parameters to be set in a YAML configuration file, and the full configuration is defined in this YAML config file (feature definitions are defined separately).

Triage

We are using Triage to build and select models. Some background and tutorials:

Tutorial on Google Colab: a quick tutorial hosted on Google Colab (no setup necessary) for users who are completely new to Triage
Dirty Duck Tutorial: a more in-depth walk-through of Triage's functionality and concepts using sample data
QuickStart Guide: try Triage on your own project and data
Suggested workflow
Understanding the configuration file

How to run

Requirements

Triage is installed and the data is in a PostgreSQL database
Place the experiment config YAML in pipeline/configs folder
The database credentials and fed through environmental variables. The required environmental variables:
- PGUSER (username for the db)
- PGHOST (database server)
- PGDATABASE (database name)
- PGROLE (shared database role, if any)
- PGPORT (relevant port of the database server)
- PGPASSWORD (password to authenticate access to the db server)
- PYTHONPATH (should be the path to the base folder)
These values could be set on the .bashrc or .bash_profile files in the users home folder, or could be loaded through a .env file (see .env_sample) placed in the base folder. That file needs to contain the above fields (format: field=value).
If a .env file is used, prior to running the pipeline, can load the variables using the following command

 export $(xargs < .env)

Running the pipeline

The pipeline is executed using the run.py script, which takes a config file (filename only, not full path; the file should be located in pipeline/configs) as a parameter. The following command can be used to execute the pipeline for the specified config.

python run.py -c <config_filename> -n <number of jobs>

Other tips:

You can change the number of jobs across which you would like Triage to parallelize the process. If not provided, this runs a single threaded experiment (integer).
A bash script placed in the base folder that loads the environmental variables and runs the pipeline could make the process easier.

Project Team

Alice Lai
Kasun Amarasinghe
Rayid Ghani

Data Science for Social Good Fellowship Team (2022):

Joachim Baumann, DSSG Fellow
Arun Frey, DSSG Fellow
Abigail Smith, DSSG Fellow
Catalina Vajiac, DSSG Fellow
Kit Rodolfa, DSSG Mentor
Adolfo De Unanue, DSSG Mentor
Kasun Amarasinghe, DSSG Mentor

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
pipeline		pipeline
.env_sample		.env_sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reducing Entry Into Homelessness through ML-Informed Rental Assistance

Data

Formulation of the Modeling Problem

Cohort

Outcome Label

Alternate formulations

Methodology

Repository Structure

Triage

How to run

Requirements

Running the pipeline

Project Team

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

dssg/acdhs_housing_public

Folders and files

Latest commit

History

Repository files navigation

Reducing Entry Into Homelessness through ML-Informed Rental Assistance

Data

Formulation of the Modeling Problem

Cohort

Outcome Label

Alternate formulations

Methodology

Repository Structure

Triage

How to run

Requirements

Running the pipeline

Project Team

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages