GnnExplainer loss function and node/edge feature mask learning #8061

ashkspark · 2023-09-19T18:45:25Z

ashkspark
Sep 19, 2023

My understanding is that in GNNExplainer method we are trying to find an edge as well as a feature mask to minimize the following loss function.

where \alpha and \beta are constants. My question is about learning binary feature selector ‘F’. In the paper, they mention that the binary mask that they introduced ignores features that are important for prediction but take values close to zero. They further said one can use marginalization over all feature subset and employ a Monter Carlo estimate. See this part from the paper:

They further mentioned that they use a reparameterization technique in the back propagation process.

Questions:

i) Are these two properties, i.e., M.C. estimate and reparameterization techniques implemented in GNNExplainer’s implementation in PyG?

ii) Are there any examples of using binary feature mask to generate an explanation or feature importance for traditional supervised models? Because it seems to me this way of generating feature importance fails to consider feature interaction to find the corresponding feature attribution when compared to other methods like Shapley? Why is it ok to learn such a mask for graphs but not generally ok for traditional ML models?

Answered by RexYing

Sep 25, 2023

@ashkspark For reparameterization, only datasets with node features (where there is also a need to extract important feature dimensions) are needed and would not be applicable to general graph datasets and didn't fit the api as well.

But the implementation is simple: we sample Z according to the feature distribution of the training set for every feature dimension (I only experimented on categorical features, in which case we just sample by class probability of the categorical feature); and then apply the mask F using that formula. So for each feature dimension, when F=0, you get the randomly sampled Z according to the marginal, which means the feature is not important. When F=1, you rec…

View full answer

rusty1s · 2023-09-25T12:06:25Z

rusty1s
Sep 25, 2023
Maintainer

Hey @ashkspark, sorry for my super late reply. This is currently no implemented in our GNNExplainer implementation, and I am even unable to find it in the official implementation. I am sure @RexYing can help out here :)
Can you clarify what you mean by feature interaction? You are right that the current attributions cannot explain phenomenons such as "these features generally need to appear together".

0 replies

RexYing · 2023-09-25T22:33:42Z

RexYing
Sep 25, 2023
Collaborator

@ashkspark For reparameterization, only datasets with node features (where there is also a need to extract important feature dimensions) are needed and would not be applicable to general graph datasets and didn't fit the api as well.

But the implementation is simple: we sample Z according to the feature distribution of the training set for every feature dimension (I only experimented on categorical features, in which case we just sample by class probability of the categorical feature); and then apply the mask F using that formula. So for each feature dimension, when F=0, you get the randomly sampled Z according to the marginal, which means the feature is not important. When F=1, you recover X which means the feature dimension is important. And then we can threshold out the unimportant dimensions just as how we threshold out unimportant edges.

So subgraphX (also implemented in pyg) considers shapley value for the subset of important edges. Similar approach can be considered for node feature as well. It should be possible to integrate SHAP / LIME with edge-mask-based explainability methods.
Since the objective is to find the concise subset of nodes that maximally approximate the model prediction / ground truths, it's not really feature importance in the sense of shapley values (aggregated over multiple possible combinations of feature subset). It's rather the set of edges that best explain the model prediction (and exclude redundant feature / edges for the prediction). So it's like a max rather than a weighted sum, if that makes sense to you. In that sense it's also considering interactions among features / edges (unlike occlusion), but not over all possible combinations. I think it's possible to do this for traditional models, if by traditional you consider those that are also done by backprop. It won't be applicable for those that do not do sgd-like optimizations.

1 reply

ashkspark Sep 26, 2023
Author

Thank you both. @rusty1s, @RexYing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GnnExplainer loss function and node/edge feature mask learning #8061

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GnnExplainer loss function and node/edge feature mask learning #8061

Uh oh!

ashkspark Sep 19, 2023

Replies: 2 comments · 1 reply

Uh oh!

rusty1s Sep 25, 2023 Maintainer

Uh oh!

RexYing Sep 25, 2023 Collaborator

Uh oh!

ashkspark Sep 26, 2023 Author

ashkspark
Sep 19, 2023

Replies: 2 comments 1 reply

rusty1s
Sep 25, 2023
Maintainer

RexYing
Sep 25, 2023
Collaborator

ashkspark Sep 26, 2023
Author