-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Hi @matthewwardrop ,
I noticed a change in the naming of treatment encoded categoricals shipped with formulaic 1.1.0. Is this a desired change?
In some circumstances, treatment encoded variables are now called C(state)[2] instead of previously C(state)[T.2].
Is this an intended change / intended behavior?
import pandas as pd
import formulaic
from formulaic import Formula
# Formulaic >= 1.1.0
print(formulaic.__version__)
# 1.1.0
data = pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/refs/heads/master/pyfixest/did/data/df_het.csv")
data[["state", "unit"]].dtypes
# np.int64, np.int64
mm = Formula("~ -1 + C(state) + C(unit)").get_model_matrix(data)
mm.model_spec.column_names[0:5]
# ('C(state)[1]', 'C(state)[2]', 'C(state)[3]', 'C(state)[4]', 'C(state)[5]')
# Formulaic < 1.1.0
print(formulaic.__version__)
# 1.0.2
data = pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/refs/heads/master/pyfixest/did/data/df_het.csv")
mm = Formula("~ -1 + C(state) + C(unit)").get_model_matrix(data)
mm.model_spec.column_names[0:5]
# ('C(state)[T.1]','C(state)[T.2]','C(state)[T.3]',Best, Alex
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request