-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Hi team,
First of all, thanks to the team for working on building such a good package for us to use.
I follow the example Counterfactual with Reinforcement Learning (CFRL) on Adult Census to build my own CFL.
I have a data set that is a mix of numerical, binary, and category features.
I trained a random forest classification model as the predictor model and ran counterfactualtabluer to generate the counterfactual for features that I am interested in. Below is part of the code on how i specify the candidate features and immutable feature
ranges = {'num_image': [1, 16],
'num_alternative_image': [0,6],
'num_market_bullets':[5,19]
}
from alibi.explainers import CounterfactualRLTabular
explainer = CounterfactualRLTabular(predictor=predictor,
encoder=heae.encoder,
decoder=heae.decoder,
latent_dim=LATENT_DIM,
encoder_preprocessor=heae_preprocessor,
decoder_inv_preprocessor=heae_inv_preprocessor,
coeff_sparsity=COEFF_SPARSITY,
coeff_consistency=COEFF_CONSISTENCY,
category_map=cate_map,
feature_names=model_attr,
#ranges=ranges,
immutable_features=immutable_features,
train_steps=TRAIN_STEPS,
batch_size=BATCH_SIZE,
backend="tensorflow")
explainer = explainer.fit(X=X_train.to_numpy())
X_positive = X_test[np.argmax(predictor(X_test), axis=1) == 1]
X = X_positive[:1000]
Y_t = np.array([0])
#index 20 num_image, 21 num_alternative_image, 22 num_market_bullets. if i put feature name i will get error somehow.
C = [{20: [1, 10],21:[0,6], 22: [5, 10]}]
explanation = explainer.explain(X, Y_t, C)
after I get the counterfactual df I compared it with original df and got the difference columns below. The avg_delivery_days is immutable but also changes though very tiny change, for 'num_image', 'num_alternative_image' , 'num_market_bullets' the change is also minimal. Can I see the changed features play an important role in predicting the label (>0.4 or <=0.4) since a small change and flip the label ? Did i use the right counterfactual function for my use case? :

For tabluar data , do i always need encoder and decoder? if its already binary should i put binary feature in category_map in below function ?
heae_preprocessor, heae_inv_preprocessor = get_he_preprocessor(X=X_train, feature_names=model_attr, category_map=cate_map, feature_types=feature_types)
Another question I have is what function I can use for the environment models, such as boost regression or a regression type of black box model?
If I tried to use
explainer = CounterfactualRLTabular(predictor=predictor,
encoder=heae.encoder,
decoder=heae.decoder,
latent_dim=LATENT_DIM,
encoder_preprocessor=heae_preprocessor,
decoder_inv_preprocessor=heae_inv_preprocessor,
coeff_sparsity=COEFF_SPARSITY,
coeff_consistency=COEFF_CONSISTENCY,
category_map=cate_map,
feature_names=model_attr,
#ranges=ranges,
immutable_features=immutable_features,
train_steps=TRAIN_STEPS,
batch_size=BATCH_SIZE,
backend="tensorflow")
but replace predictor as the boost regression model. What other changes do I need to make since the regression model, the prediction is continuous, how can i customize the reward function?
sorry for all these questions, as i am a starter in RL and is still learning everthing so forgive me if my questions sounds dump.
thanks for your time and help