-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Reproducible example:
import pandas as pd
import featuretools as ft
from featuretools.variable_types import IPAddress
from autonormalize import autonormalize as an
input_df = pd.DataFrame(
{
'ip_address': ['128.101.101.101', '1.120.0.0', '17.86.21.0', '23.1.23.255'],
'length': [900, 60, 20, 30],
'city': ['adl', 'syd', 'adl', 'syd'],
'country': ['aus', 'aus', 'aus', 'aus'],
'is_threat': [True, False, False, False]
}
)
variable_types = {'ip_address': IPAddress}
es = ft.EntitySet()
es.entity_from_dataframe(entity_id='data',
dataframe=input_df,
index='index',
variable_types=variable_types,
make_index=True)
Column ip_address is set to dtype featuretools.variable_types.IPAddress:
print(es['data'].variables)
[<Variable: index (dtype = index)>,
<Variable: length (dtype = numeric)>,
<Variable: city (dtype = categorical)>,
<Variable: country (dtype = categorical)>,
<Variable: is_threat (dtype = boolean)>,
<Variable: ip_address (dtype = ip)>]
After normalisation, ip_address resolves back to categorical:
normalized_es = an.normalize_entity(es)
for entity in normalized_es.entity_dict:
print(normalized_es.entity_dict[entity].variables)
Entity: index
[<Variable: index (dtype = index)>,
<Variable: length (dtype = numeric)>,
<Variable: city (dtype = id)>,
<Variable: is_threat (dtype = boolean)>,
<Variable: ip_address (dtype = categorical)>]
Entity: city
[<Variable: city (dtype = index)>, <Variable: country (dtype = categorical)>]
To get the desired features, the variable types need to be preserved so the right primitives can be applied when running dfs. My question is whether this should be the desired behaviour or do the variable types need to be set manually again?
Metadata
Metadata
Assignees
Labels
No labels