How Shapley values work

A common concern in machine learning (ML) solutions is that apparent predictive power is coming from a problematic source.1 For example, a model might learn to predict burrito quality from latitude and longitude. In this case, the actual signal is likely coming from a particular city or neighborhood having citizens who are good at making burritos. Since there is nothing inherently better about an otherwise identical burrito constructed at 30°N instead of 40°N, this model will not generalize well to unobserved space or time.

There is an inherent tension between models with high predictive power (high variance or high VC models) and models with high explanatory power (high bias or low VC models). At one extreme we can imagine a model which ignores all observed features and always predicts the average value for a dataset2 -- it is very easy to understand where every prediction came from, but none of them will be very informative. At the other extreme, we can imagine building a simulation of the entire universe to predict how, after about 14 billion years, an individual burrito will be received. No matter what the model says, understanding how it reached that particular conclusion will be very difficult.

Explainable AI (XAI) is the name given to a collection of tools for understanding high VC models (for a good introduction, see Christopher Molnar's "Interpretable ML" book). Typically, these tools fall into one of three categories:

  1. Tools which make internal model parameters more legible
  2. Tools which explain how predictions change over small perturbations
  3. Tools which build a surrogate model of the inference model

What do we mean by a surrogate model? This is a lower VC (more explainable) model which is not attempting to predict the world -- it is attempting to predict the behavior of the high VC model we are using for inference. We could, for example, imagine training a linear model -- with the same features as our inference model -- but predicting the output of the inference model. We would then be able directly examine the coefficients of the linear model to tell us the effect that each feature has on the final prediction.

Solving this problem using least-squares regression is problematic for a number of reasons, but it turns out we can assign contribution values to each feature using a different equation, in a way that is still additive. The formula for calculating these contributions was proved by the mathematician Lloyd Shapley, and so these contributions are called Shapley values.

The construction of Shapley values works like this: given N features in a model predicting a single outcome, the contribution of feature Ni to that prediction is the average amount the prediction changes when the real value for Ni is included, averaged across every possible combination of other features.

Let's look at a simple example. Let's say that I have a linear model that predicts burrito quality (up to five stars) from cost in dollars, the quality of the salsa, and the uniformity of the burrito:

quality = 2.5 + 0.1*cost + 0.5*salsa + 0.2*uniformity

If we want to know the Shapley values for a good salsa in longitudinally segmented (poorly uniform) burrito that costs 7USD, we average the following values:

  1. The difference between an average burrito and one with good salsa
  2. The difference between the average seven dollar burrito, and one with good salsa; and, a poorly uniform burrito and a poorly uniform burrito with good salsa
  3. The difference between a seven dollar nonuniform but otherwise average burrito, and one with good salsa

The math looks something like this (we'll assume our prices go from 0 to 10 and the quality rankings go from 0 to 1, and that all are uniformly distributed, to make the math simple):

(0.1*5 + 0.5*1 + 0.2*0.5) - (0.1*5 + 0.5*0.5 + 0.2*0.5) = 0.25
((0.1*7 + 0.5*1 + 0.2*0.5) - (7*0.1 + 0.5*0.5 + 0.2*0.5) + (0.5*1 + 0.5*1 + 0.2*0.0) - (0.1*5+ 0.5*0.5 + 0.2*0.0)) / 2 = 0.25
(0.1*7 + 0.5*1 + 0.2*0.0) - (0.1*7 + 0.5*0.5 + 0.2*0.0) = 0.25
(0.25 + 0.25 + 0.25) / 3 = 0.25

So out of the predicted value for this whole burrito (which is 3.7), having good salsa is pushing the value of the burrito up by 0.25, on our arbitrary ranking scale. You'll notice that this is exactly half of the linear coefficient for this value -- this is not an accident. When extracting Shapley values from a model that is already globally linear, the contribution of any feature is the average contribution for the whole dataset. Since we stated that our values were uniformally distributed, the average effect of salsa on the dataset is half of its maximum possible contribution.

Since this was mathematically simple, but not very interesting, let's look at a more realistic example. Let's imagine that I trained a Random Forest to predict burrito quality (which I did), and needed to calculate some explainations for its predictions by hand. In this case the formulation will look a bit different, since we aren't calculating the burrito score for ourselves, but giving different feature values to our model, and letting it return a score.

RandomForest(good_salsa) - RandomForest(average_salsa) = 2.78 - 2.57 = 0.2
(RandomForest(cost + good_salsa) - RandomForest(cost + average_salsa) + RandomForest(nonuniform + good_salsa) - RandomForest(nonuniform + average_salsa)) / 2 = ((2.83 - 2.56) + (2.78 - 2.19)) / 2 = 0.43
RandomForest(cost + nonuniform + good_salsa) - RandomForest(cost + nonuniform + average_salsa) = (2.57 - 1.27) = 1.3
(0.2 + 0.43 + 1.3) / 3 = 0.64

There are a few interesting things to note here. First, there isn't much difference between a totally average burrito, and one that happens to have good salsa but is otherwise unremarkable. The biggest effect we see is for salsa saving an otherwise reasonably-priced but sloppy burrito. The non-linear nature of the random forest seems to be picking up these nuances, but what we want to know is, for this particular burrito, what good was the salsa?

0.64 stars, ignoring the effect of anything else in the burrito.

This is substantially larger than the effect given from the linear regression, which was 0.25. The high bias of simple models makes the explanation of a prediction easy, but if the prediction itself is inaccurate, the explanation is really only useful for diagnosing the failure of the model.

In this example, we've kept the feature set quite simple, because the number of comparison needed to calculate a single Shapley value grows exponentially with feature number. In some cases, it is possible to speed up these calculations in a white-box setting, where you can inspect the internal parameters of the model. For example, given the distribution of values for salsa quality and the coefficient from the linear model, we could have calculated the Shapley value directly as an average. In Python, there is a library that does this called SHAP for a small number of estimators (principally ones which are linear or tree-based).


  1. this article is based on a lightning talk given at SciPy 2019, viewable here 

  2. in practice, these kinds of models are very useful for debugging and benchmarking, and are available in scikit-learn under the name DummyRegressor and DummyClassifier