Skip to content

Add support to ModelSpec for extracting metadata about factor levels used in specific columns. #238

@s3alfisc

Description

@s3alfisc

Hi @matthewwardrop ,

I noticed a change in the naming of treatment encoded categoricals shipped with formulaic 1.1.0. Is this a desired change?
In some circumstances, treatment encoded variables are now called C(state)[2] instead of previously C(state)[T.2].

Is this an intended change / intended behavior?

import pandas as pd 
import formulaic
from formulaic import Formula

# Formulaic >= 1.1.0 
print(formulaic.__version__)
# 1.1.0
data = pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/refs/heads/master/pyfixest/did/data/df_het.csv")
data[["state", "unit"]].dtypes
# np.int64, np.int64

mm = Formula("~ -1 +  C(state) + C(unit)").get_model_matrix(data)
mm.model_spec.column_names[0:5]
# ('C(state)[1]', 'C(state)[2]', 'C(state)[3]', 'C(state)[4]', 'C(state)[5]')

# Formulaic < 1.1.0 
print(formulaic.__version__)
# 1.0.2
data = pd.read_csv("https://raw.githubusercontent.com/py-econometrics/pyfixest/refs/heads/master/pyfixest/did/data/df_het.csv")

mm = Formula("~ -1 +  C(state) + C(unit)").get_model_matrix(data)
mm.model_spec.column_names[0:5]
# ('C(state)[T.1]','C(state)[T.2]','C(state)[T.3]',

Best, Alex

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions