Skip to content

immutable feature changes when using  #1022

@Berlyli866

Description

@Berlyli866

Hi team,
First of all, thanks to the team for working on building such a good package for us to use.

I follow the example Counterfactual with Reinforcement Learning (CFRL) on Adult Census to build my own CFL.

I have a data set that is a mix of numerical, binary, and category features.
I trained a random forest classification model as the predictor model and ran counterfactualtabluer to generate the counterfactual for features that I am interested in. Below is part of the code on how i specify the candidate features and immutable feature


ranges = {'num_image': [1, 16], 
         'num_alternative_image': [0,6],
          'num_market_bullets':[5,19]
         }

from alibi.explainers import CounterfactualRLTabular
explainer = CounterfactualRLTabular(predictor=predictor,
                                    encoder=heae.encoder,
                                    decoder=heae.decoder,
                                    latent_dim=LATENT_DIM,
                                    encoder_preprocessor=heae_preprocessor,
                                    decoder_inv_preprocessor=heae_inv_preprocessor,
                                    coeff_sparsity=COEFF_SPARSITY,
                                    coeff_consistency=COEFF_CONSISTENCY,
                                    category_map=cate_map,
                                    feature_names=model_attr,
                                    #ranges=ranges,
                                    immutable_features=immutable_features,
                                    train_steps=TRAIN_STEPS,
                                    batch_size=BATCH_SIZE,
                                    backend="tensorflow")


explainer = explainer.fit(X=X_train.to_numpy())

X_positive = X_test[np.argmax(predictor(X_test), axis=1) == 1]
X = X_positive[:1000]
Y_t = np.array([0])
#index 20 num_image, 21 num_alternative_image, 22 num_market_bullets. if i put feature name i will get error somehow. 
C = [{20: [1, 10],21:[0,6], 22: [5, 10]}]
explanation = explainer.explain(X, Y_t, C)

after I get the counterfactual df I compared it with original df and got the difference columns below. The avg_delivery_days is immutable but also changes though very tiny change, for 'num_image', 'num_alternative_image' , 'num_market_bullets' the change is also minimal. Can I see the changed features play an important role in predicting the label (>0.4 or <=0.4) since a small change and flip the label ? Did i use the right counterfactual function for my use case? :
Screenshot 2024-10-19 at 18 19 57

For tabluar data , do i always need encoder and decoder? if its already binary should i put binary feature in category_map in below function ?

heae_preprocessor, heae_inv_preprocessor = get_he_preprocessor(X=X_train, feature_names=model_attr, category_map=cate_map, feature_types=feature_types)

Another question I have is what function I can use for the environment models, such as boost regression or a regression type of black box model?
If I tried to use

explainer = CounterfactualRLTabular(predictor=predictor,
                                    encoder=heae.encoder,
                                    decoder=heae.decoder,
                                    latent_dim=LATENT_DIM,
                                    encoder_preprocessor=heae_preprocessor,
                                    decoder_inv_preprocessor=heae_inv_preprocessor,
                                    coeff_sparsity=COEFF_SPARSITY,
                                    coeff_consistency=COEFF_CONSISTENCY,
                                    category_map=cate_map,
                                    feature_names=model_attr,
                                    #ranges=ranges,
                                    immutable_features=immutable_features,
                                    train_steps=TRAIN_STEPS,
                                    batch_size=BATCH_SIZE,
                                    backend="tensorflow")

but replace predictor as the boost regression model. What other changes do I need to make since the regression model, the prediction is continuous, how can i customize the reward function?

sorry for all these questions, as i am a starter in RL and is still learning everthing so forgive me if my questions sounds dump.

thanks for your time and help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions