The explain()
function now returns a matrix, as
opposed to a tibble, which makes
more sense since Shapley values values are ALWAYS numeric; data frames
(and tibbles’s)
are really only necessary when the data are heterogeneous. In essence,
the output from explain()
will act like an R matrix but
with class structure c("explain", "matrix", "array")
; you
could always convert the results to a tibble using
tibble::as_tibble(result)
.
Two new data sets, titanic
and
titanic_mice
, were added to the package; see the
corresponding help pages for details.
The plotting functions have all been deprecated
in favor of the (far superior) shapviz package by
@Mayer79
(grid.arrange()
is also no longer imported from gridExtra).
Consequently, the output from explain()
no longer needs to
have its own "explain"
class (only an ordinary
c("matrix", "array")
object is returned).
The explain()
function gained three new
arguments:
baseline
, which defaults to NULL
,
containing the baseline to use when adjusting Shapley values to meet the
efficiency property. If NULL
and
adjust = TRUE
, it will default to the average training
prediction (i.e., the average prediction over X
.)
shap_only
, which defaults to TRUE
,
determines whether to return a matrix of Shapley values
(TRUE
) containing the baseline as aanattribute or a list
containing the Shapley values, corresponding feature values, and
baseline (FALSE
); setting to FALSE
is a
convenience when using the shapviz
package.
parallel
, which defaults to FALSE
for
determining whether or not to compute Shapley values in parallel (across
features) using any suitable parallel backend supported by foreach.
The X
and newdata
arguments of
explain()
should now work with tibble (#20).
Minor change to explain.lgb.Booster()
to support
breaking changes in lightgbm v4.0.0.
(Thanks to @jameslamb and @Mayer79.)
The dependency on matrixStats
has been removed in favor of using R’s internal apply()
and
var()
functions.
The dependency on plyr, which has been retired, has been removed in favor of using foreach directly.
Removed CXX_STD=CXX11 flag, so increased R dependency to R >= 3.6.
slowtests/
directory (for now).The force_plot()
function should now be compatible
with shap (>=0.36.0); thanks to @hfshr and @jbwoillard for
reporting (#12).
Fixed minor name repair issue caused by tibble.
explain()
should now be MUCH faster at
explaining a single observation, especially when nsim
is
relatively large (e.g., nsim >= 1000
).The default method of explain()
gained a new logical
argument called adjust
. When adjust = TRUE
(and nsim > 1
), the algorithm will adjust the sum of the
estimated Shapley values to satisfy the efficiency property;
that is, to equal the difference between the model’s prediction for that
sample and the average prediction over all the training data. This
option is experimental and we follow the same approach as in shap (#6).
New (experimental) function for constructing force plots (#7) to help visualize prediction explanations. The function is also a generic which means additional methods can be added.
Function explain()
became a generic and gained a new
logical argument, exact
, for computing exact Shapley
contributions for linear models (Linear SHAP, which assumes independent
features) and boosted decision trees (Tree SHAP). Currently, only
"lm"
, "glm"
, and "xgb.Booster"
objects are supported (#2)(#3).
Minor improvements to package documentation.
Removed unnecessary legend from contribution plots.
Tweak imports (in particular, use
@importFrom Rcpp sourceCpp
tag).
Fixed a typo in the package description; Shapley was misspelled as Shapely (fixed by Dirk Eddelbuettel in (#1)).
You can now specify type = "contribution"
in the
call to autoplot.fastshap()
to plot the explanation for a
single instance (controlled by the row_num
argument).
autoplot.fastshap()
gained some useful new
arguments:
color_by
for specifying an additional feature to
color by for dependence plots (i.e., whenever
type = "dependence"
);
smooth
, smooth_color
,
smooth_linetype
, smooth_size
, and
smooth_alpha
for adding/controlling a smoother in
dependence plots (i.e., whenever
type = "dependence"
).
...
which can be used to pass on additional
parameters to geom_col()
(when
type = "importance"
) or geom_point()
(when
type = "dependence"
).
Function fastshap()
was renamed to
explain()
.
Functions explain()
and
explain_column()
(not currently exported) now throw an
error whenever the inputs X
and newdata
do not
inherit from the same class.
Fixed a bug in the C++ source that gave more weight to extreme permutations.
Fixed a bug in the C++ source that caused doubles to be incorrectly converted to integers.
Fixed a bug in autoplot.fastshap()
when
type = "importance"
; in particular, the function
incorrectly used sum(|Shapley value|)
instead of
mean(|Shapley value|)
.