Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: kernelshap
Title: Kernel SHAP
Version: 0.8.1
Version: 0.9.0
Authors@R: c(
person("Michael", "Mayer", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0009-0007-2540-9629")),
Expand Down
14 changes: 13 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
# kernelshap 0.8.1
# kernelshap 0.9.0

### Bug fix

With input from Mario Wuethrich and Ian Covert and his [repo](https://github.com/iancovert/shapley-regression),
we have fixed a bug in how `kernelshap()` calculates Kernel weights.

- The differences caused by this are typically very small.
- Models with interactions of order up to two have been unaffected.
- Exact Kernel SHAP now provides identical results to exact permutation SHAP.

Fixed in [#168](https://github.com/ModelOriented/kernelshap/pull/168), which also has received
unit tests against Python's "shap".

### API

Expand Down
30 changes: 9 additions & 21 deletions R/kernelshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,11 @@
#' Otherwise, an almost exact hybrid algorithm combining exact calculations and
#' iterative paired sampling is used, see Details.
#'
#' Note that (exact) Kernel SHAP is only an approximation of (exact) permutation SHAP.
#' Thus, for up to eight features, we recommend [permshap()]. For more features,
#' [permshap()] tends to be inefficient compared the optimized hybrid strategy
#' of Kernel SHAP.
#'
#' @details
#' The pure iterative Kernel SHAP sampling as in Covert and Lee (2021) works like this:
#'
#' 1. A binary "on-off" vector \eqn{z} is drawn from \eqn{\{0, 1\}^p}
#' such that its sum follows the SHAP Kernel weight distribution
#' (normalized to the range \eqn{\{1, \dots, p-1\}}).
#' 1. A binary "on-off" vector \eqn{z} is drawn from \eqn{\{0, 1\}^p} according to
#' a special weighting logic.
#' 2. For each \eqn{j} with \eqn{z_j = 1}, the \eqn{j}-th column of the
#' original background data is replaced by the corresponding feature value \eqn{x_j}
#' of the observation to be explained.
Expand All @@ -33,17 +27,13 @@
#'
#' This is repeated multiple times until convergence, see CL21 for details.
#'
#' A drawback of this strategy is that many (at least 75%) of the \eqn{z} vectors will
#' have \eqn{\sum z \in \{1, p-1\}}, producing many duplicates. Similarly, at least 92%
#' of the mass will be used for the \eqn{p(p+1)} possible vectors with
#' \eqn{\sum z \in \{1, 2, p-2, p-1\}}.
#' This inefficiency can be fixed by a hybrid strategy, combining exact calculations
#' with sampling.
#' To avoid the re-evaluation of identical coalition vectors, we have implemented
#' a hybrid strategy, combining exact calculations with sampling.
#'
#' The hybrid algorithm has two steps:
#' 1. Step 1 (exact part): There are \eqn{2p} different on-off vectors \eqn{z} with
#' \eqn{\sum z \in \{1, p-1\}}, covering a large proportion of the Kernel SHAP
#' distribution. The degree 1 hybrid will list those vectors and use them according
#' \eqn{\sum z \in \{1, p-1\}}.
#' The degree 1 hybrid will list those vectors and use them according
#' to their weights in the upcoming calculations. Depending on \eqn{p}, we can also go
#' a step further to a degree 2 hybrid by adding all \eqn{p(p-1)} vectors with
#' \eqn{\sum z \in \{2, p-2\}} to the process etc. The necessary predictions are
Expand Down Expand Up @@ -96,12 +86,10 @@
#' worse than the hybrid strategy and should therefore only be used for
#' studying properties of the Kernel SHAP algorithm.
#' - `1`: Uses all \eqn{2p} on-off vectors \eqn{z} with \eqn{\sum z \in \{1, p-1\}}
#' for the exact part, which covers at least 75% of the mass of the Kernel weight
#' distribution. The remaining mass is covered by random sampling.
#' for the exact part. The remaining mass is covered by random sampling.
#' - `2`: Uses all \eqn{p(p+1)} on-off vectors \eqn{z} with
#' \eqn{\sum z \in \{1, 2, p-2, p-1\}}. This covers at least 92% of the mass of the
#' Kernel weight distribution. The remaining mass is covered by sampling.
#' Convergence usually happens in the minimal possible number of iterations of two.
#' \eqn{\sum z \in \{1, 2, p-2, p-1\}}. The remaining mass is covered by sampling.
#' Convergence usually happens very fast.
#' - `k>2`: Uses all on-off vectors with
#' \eqn{\sum z \in \{1, \dots, k, p-k, \dots, p-1\}}.
#' @param m Even number of on-off vectors sampled during one iteration.
Expand Down
2 changes: 1 addition & 1 deletion R/permshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
#' Furthermore, the 2p on-off vectors with sum <=1 or >=p-1 are evaluated only once,
#' similar to the degree 1 hybrid in [kernelshap()] (but covering less weight).
#'
#' @param exact If `TRUE`, the algorithm will produce exact SHAP values
#' @param exact If `TRUE`, the algorithm produces exact SHAP values
#' with respect to the background data.
#' The default is `TRUE` for up to eight features, and `FALSE` otherwise.
#' @param low_memory If `FALSE` (default up to p = 15), the algorithm evaluates p
Expand Down
76 changes: 34 additions & 42 deletions R/utils_kernelshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,7 @@ solver <- function(A, b, constraint) {
# to Kernel SHAP weights -> (m x p) matrix.
# The argument S can be used to restrict the range of sum(z).
sample_Z <- function(p, m, feature_names, S = 1:(p - 1L)) {
# First draw s = sum(z) according to Kernel weights (renormalized to sum 1)
probs <- kernel_weights(p, S = S)
probs <- kernel_weights(p, per_coalition_size = TRUE, S = S)
N <- S[sample.int(length(S), m, replace = TRUE, prob = probs)]

# Then, conditional on that number, set random positions of z to 1
Expand Down Expand Up @@ -144,8 +143,8 @@ input_sampling <- function(p, m, deg, feature_names) {
S <- (deg + 1L):(p - deg - 1L)
Z <- sample_Z(p = p, m = m / 2, feature_names = feature_names, S = S)
Z <- rbind(Z, !Z)
w_total <- if (deg == 0L) 1 else 1 - 2 * sum(kernel_weights(p)[seq_len(deg)])
w <- w_total / m
w <- if (deg == 0L) 1 else 1 - prop_exact(p, deg = deg)
w <- w / m
list(Z = Z, w = rep.int(w, m), A = crossprod(Z) * w)
}

Expand All @@ -159,33 +158,10 @@ input_sampling <- function(p, m, deg, feature_names) {
input_exact <- function(p, feature_names) {
Z <- exact_Z(p, feature_names = feature_names)
Z <- Z[2L:(nrow(Z) - 1L), , drop = FALSE]
# Each Kernel weight(j) is divided by the number of vectors z having sum(z) = j
w <- kernel_weights(p) / choose(p, 1:(p - 1L))
list(Z = Z, w = w[rowSums(Z)], A = exact_A(p, feature_names = feature_names))
}

#' Exact Matrix A
#'
#' Internal function that calculates exact A.
#' Notice the difference to the off-diagnonals in the Supplement of
#' Covert and Lee (2021). Credits to David Watson for figuring out the correct formula,
#' see our discussions in https://github.com/ModelOriented/kernelshap/issues/22
#'
#' @noRd
#' @keywords internal
#'
#' @param p Number of features.
#' @param feature_names Feature names.
#' @returns A (p x p) matrix.
exact_A <- function(p, feature_names) {
S <- 1:(p - 1L)
c_pr <- S * (S - 1) / p / (p - 1)
off_diag <- sum(kernel_weights(p) * c_pr)
A <- matrix(
data = off_diag, nrow = p, ncol = p, dimnames = list(feature_names, feature_names)
)
diag(A) <- 0.5
A
kw <- kernel_weights(p, per_coalition_size = FALSE) # Kernel weights for all subsets
w <- kw[rowSums(Z)] # Corresponding weight for each row in Z
w <- w / sum(w)
list(Z = Z, w = w, A = crossprod(Z, w * Z))
}

# List all length p vectors z with sum(z) in {k, p - k}
Expand Down Expand Up @@ -227,23 +203,39 @@ input_partly_exact <- function(p, deg, feature_names) {
stop("p must be >=2*deg")
}

kw <- kernel_weights(p)
Z <- w <- vector("list", deg)
kw <- kernel_weights(p, per_coalition_size = FALSE)

Z <- vector("list", deg)
for (k in seq_len(deg)) {
Z[[k]] <- partly_exact_Z(p, k = k, feature_names = feature_names)
n <- nrow(Z[[k]])
w_tot <- kw[k] * (2 - (p == 2L * k))
w[[k]] <- rep.int(w_tot / n, n)
}
w <- unlist(w, recursive = FALSE, use.names = FALSE)
Z <- do.call(rbind, Z)

w <- kw[rowSums(Z)]
w_target <- prop_exact(p, deg = deg) # How much of total weight to spend here
w <- w / sum(w) * w_target
list(Z = Z, w = w, A = crossprod(Z, w * Z))
}

# Kernel weights normalized to a non-empty subset S of {1, ..., p-1}
kernel_weights <- function(p, S = seq_len(p - 1L)) {
probs <- (p - 1L) / (choose(p, S) * S * (p - S))
probs / sum(probs)
# Kernel weight distribution
#
# `per_coalition_size = TRUE` is required, e.g., when one wants to sample random masks
# according to the Kernel SHAP distribution: Pick a coalition size as per
# these weights, then randomly place "on" positions. `FALSE` refer to weights
# if all masks has been calculated and one wants to calculate their weights based
# on the number of "on" positions.
kernel_weights <- function(p, per_coalition_size, S = seq_len(p - 1L)) {
const <- if (per_coalition_size) 1 else choose(p, S)
probs <- (p - 1) / (const * S * (p - S)) # could drop the numerator
return(probs / sum(probs))
}

# How much Kernel SHAP weights do coalitions of size
# {1, ..., deg, ..., p-deg-1 ..., p-1} have?
prop_exact <- function(p, deg) {
if (deg == 0) {
return(0)
}
w <- kernel_weights(p, per_coalition_size = TRUE)
w_total <- 2 * sum(w[seq_len(deg)]) - w[deg] * (p == 2 * deg)
return(w_total)
}
23 changes: 10 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,18 @@

The package contains three functions to crunch SHAP values:

- **`permshap()`**: Permutation SHAP algorithm of [1]. Recommended for models with up to 8 features, or if you don't trust Kernel SHAP. Both exact and sampling versions are available.
- **`kernelshap()`**: Kernel SHAP algorithm of [2] and [3]. Recommended for models with more than 8 features. Both exact and (pseudo-exact) sampling versions are available.
- **`permshap()`**: Permutation SHAP algorithm of [1]. Both exact and sampling versions are available.
- **`kernelshap()`**: Kernel SHAP algorithm of [2] and [3]. Both exact and (pseudo-exact) sampling versions are available.
- **`additive_shap()`**: For *additive models* fitted via `lm()`, `glm()`, `mgcv::gam()`, `mgcv::bam()`, `gam::gam()`, `survival::coxph()`, or `survival::survreg()`. Exponentially faster than the model-agnostic options above, and recommended if possible.

To explain your model, select an explanation dataset `X` (up to 1000 rows from the training data, feature columns only) and apply the recommended function. Use {shapviz} to visualize the resulting SHAP values.
To explain your model, select an explanation dataset `X` (up to 1000 rows from the training data, feature columns only). Use {shapviz} to visualize the resulting SHAP values.

**Remarks to `permshap()` and `kernelshap()`**

- Both algorithms need a representative background data `bg_X` to calculate marginal means (up to 500 rows from the training data). In cases with a natural "off" value (like MNIST digits), this can also be a single row with all values set to the off value. If unspecified, 200 rows are randomly sampled from `X`.
- Exact Kernel SHAP is an approximation to exact permutation SHAP. Since exact calculations are usually sufficiently fast for up to eight features, we recommend `permshap()` in this case. With more features, `kernelshap()` switches to a comparably fast, almost exact algorithm with faster convergence than the sampling version of permutation SHAP.
That is why we recommend `kernelshap()` in this case.
- For models with interactions of order up to two, SHAP values of permutation SHAP and Kernel SHAP agree,
and the implemented sampling versions provide the same results as the exact versions.
In the presence of interactions of order three or higher, this is no longer the case.
- Exact Kernel SHAP gives identical results as exact permutation SHAP. Both algorithms are fast up to 8 features.
With more features, `kernelshap()` switches to an almost exact algorithm with faster convergence than the sampling version of permutation SHAP.
- For models with interactions of order up to two, the sampling versions provide the same results as the exact versions.
- For additive models, `permshap()` and `kernelshap()` give the same results as `additive_shap`
as long as the full training data would be used as background data.

Expand Down Expand Up @@ -89,13 +87,12 @@ ps
[1,] 1.1913247 0.09005467 -0.13430720 0.000682593
[2,] -0.4931989 -0.11724773 0.09868921 0.028563613

# Kernel SHAP gives very slightly different values because the model contains
# interations of order > 2:
# Indeed, Kernel SHAP gives the same:
ks <- kernelshap(fit, X, bg_X = bg_X)
ks
# log_carat clarity color cut
# [1,] 1.1911791 0.0900462 -0.13531648 0.001845958
# [2,] -0.4927482 -0.1168517 0.09815062 0.028255442
log_carat clarity color cut
[1,] 1.1913247 0.09005467 -0.13430720 0.000682593
[2,] -0.4931989 -0.11724773 0.09868921 0.028563613

# 4) Analyze with {shapviz}
ps <- shapviz(ps)
Expand Down
63 changes: 63 additions & 0 deletions backlog/compare_with_python2.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
library(kernelshap)

n <- 100

X <- data.frame(
x1 = seq(1:n) / 100,
x2 = log(1:n),
x3 = sqrt(1:n),
x4 = sin(1:n),
x5 = (seq(1:n) / 100)^2,
x6 = cos(1:n)
)
head(X)

pf <- function(model, newdata) {
x <- newdata
x[, 1] * x[, 2] * x[, 3] * x[, 4] + x[, 5] + x[, 6]
}
ks <- kernelshap(pf, head(X), bg_X = X, pred_fun = pf)
ks # -1.196216 -1.241848 -0.9567848 3.879420 -0.33825 0.5456252
es <- permshap(pf, head(X), bg_X = X, pred_fun = pf)
es # -1.196216 -1.241848 -0.9567848 3.879420 -0.33825 0.5456252

set.seed(10)
kss <- kernelshap(
pf,
head(X, 1),
bg_X = X,
pred_fun = pf,
hybrid_degree = 0,
exact = F,
m = 9000,
max_iter = 100,
tol = 0.0005
)
kss # -1.198078 -1.246508 -0.9580638 3.877532 -0.3241824 0.541247

set.seed(2)
ksh <- kernelshap(
pf,
head(X, 1),
bg_X = X,
pred_fun = pf,
hybrid_degree = 1,
exact = FALSE,
max_iter = 10000,
tol = 0.0005
)
ksh # -1.191981 -1.240656 -0.9516264 3.86776 -0.3342143 0.5426642

set.seed(1)
ksh2 <- kernelshap(
pf,
head(X, 1),
bg_X = X,
pred_fun = pf,
hybrid_degree = 2,
exact = FALSE,
m = 10000,
max_iter = 10000,
tol = 0.0001
)
ksh2 # 1.195976 -1.241107 -0.9565121 3.878891 -0.3384621 0.5451118
21 changes: 12 additions & 9 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# kernelshap 0.8.0
# kernelshap 0.9.0

Dear CRAN team
We have figured out a bug in the weighting logic of Kernel SHAP.

The package (finally!) contains a sampling version of permutation SHAP. In contrast
to other implementations, it iterates until convergence, and standard errors are provided.
This update comes with a fix which has been tested against two other implementations.

I am aware that the last release of {kernelshap} is not too long ago, but I still would love to see
this fixed before the (well-deserved) summer break.

Thanks a lot!

## Checks

Expand All @@ -18,9 +22,8 @@ R Under development (unstable) (2025-07-05 r88387 ucrt)

### Revdep OK

survex 1.2.0 ── E: 0 | W: 0 | N: 0
XAItest 1.0.1 ── E: 1 | W: 0 | N: 0
SEMdeep 1.0.0 ── E: 1 | W: 1 | N: 0
survex 1.2.0 ── E: 0 | W: 0 | N: 0
XAItest 1.0.1 ── E: 1 | W: 0 | N: 0
SEMdeep 1.0.0 ── E: 1 | W: 1 | N: 0

OK: 3
BROKEN: 0
OK: 3
Loading
Loading