-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Labels
Description
Prediction Engineering
How to use compose to write the problem definition component in cardea.
Compose is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. We can use compose to search for the cutoff times for a specific prediction problem (e.g. los) and return label_times.
The component should be easily adaptable to support multiple prediction problems:
- appointment no show
- mortality prediction
- length of stay
- etc
Design
There are two main parts that we need to define:
- Class with main function of generating label times
- Functions defining the prediction problem in mind
- We also require helper functions to create the prediction problem
Design of data_laber.py
class DataLabeler:
"""Class that defines the prediction problem.
This class supports the generation of `label_times` which
is fundamental to the feature generation phase as well
as specifying the target labels.
Args:
function (method):
function that defines the labeling function, it should return a
tuple of labeling function, the dataframe, and the name of the
target entity.
"""
def __init__(self, function):
self.function = function
def generate_label_times(self, es, *args, **kwargs):
"""Searches the data to calculate label times.
Args:
df (pandas.DataFrame):
Data frame to search and extract labels.
Returns:
composeml.LabelTimes:
Calculated labels with cutoff times.
"""
passDesign of a prediction function (e.g. appointment_no_show.py)
def appointment_no_show(es):
def missed(ds, **kwargs):
return True if 'noshow' in ds["status"].values else False
meta = {
# values to define prediction task
"entity": "appointment",
"time_index": "created",
"type": "classification",
"num_examples_per_instance": 1
}
df = denormalize(es, entities=['Appointment'])
return missed, df, meta