User:IssaRice/Shapley value: Difference between revisions

From Machinelearning
No edit summary
No edit summary
 
(21 intermediate revisions by the same user not shown)
Line 11: Line 11:


We throw in the function f because the elements of X might not be numbers. or even if they ''are'' numbers, you might want to apply some weighting other than the default one (the identity function).
We throw in the function f because the elements of X might not be numbers. or even if they ''are'' numbers, you might want to apply some weighting other than the default one (the identity function).
Now, let's take the ugly-ass formula for the shapley value that you always see:
<math>\sum_{S \subseteq N \setminus \{i\}} \frac{|S|!\ (n - |S| - 1)!}{n!} (v(S \cup \{i\}) - v(S))</math>
how is ''that'' supposed to be an average? well first of all, we said above that the shapley value is averaging over all ''sequences'' of ways to add the n players. one way to formalize the concept of a "sequence" or "ordering" is to use permutations. a permutation is just a function that reorders the elements of of a set. so each sequence corresponds to a permutation. we can recover a sequence <math>(x_1, x_2, \ldots, x_n)</math> by defining the permutation <math>\sigma(k) := x_k</math>.


So in what sense is the shapley value an average? if <math>N = \{1, \ldots, n\}</math> is the set of players, then we can define the set of all permutations <math>\mathrm{Sym}(N)</math> on <math>N</math>. (This is also denoted as <math>\mathrm{Sym}(n)</math> and called the "symmetric group of degree n" since <math>N = \{1, \ldots, n\}</math> is the "default" set of size n.)
So in what sense is the shapley value an average? if <math>N = \{1, \ldots, n\}</math> is the set of players, then we can define the set of all permutations <math>\mathrm{Sym}(N)</math> on <math>N</math>. (This is also denoted as <math>\mathrm{Sym}(n)</math> and called the "symmetric group of degree n" since <math>N = \{1, \ldots, n\}</math> is the "default" set of size n.)


the Shapley value is <math>\frac{1}{|\mathrm{Sym}(n)|} \sum_{\sigma \in \mathrm{Sym}(n)} f_i(\sigma(1), \ldots, \sigma(n))</math>
since the shapley value is an ''average'' and we are in particular averaging over all sequences, we want to rewrite the formula as something that looks like:
 
<math>\frac{1}{|\mathrm{Sym}(n)|} \sum_{\sigma \in \mathrm{Sym}(n)} f(\sigma)</math>
 
And in fact, at this point we know enough to convert our verbal understanding into a formula like the one above.
 
<math>\varphi_i(v) = \frac{1}{|\mathrm{Sym}(n)|} \sum_{\sigma \in \mathrm{Sym}(n)} (v(\{k : \sigma(k) < \sigma(i)\} \cup \{i\}) - v(\{k : \sigma(k) < \sigma(i)\}))</math>
 
a relevant fact is that the size of <math>\mathrm{Sym}(n)</math> is <math>n!</math>.
 
 
There is another way to look at this. or rather, a way to extend our understanding. a common thing done in algebra is to [https://en.wikipedia.org/wiki/Symmetrization#n_variables ''symmetrize''] a function by adding up all the permutations of the variables. a symmetric function is one in which you can interchange any of the variables and the function will stay the same. not in the trivial sense that <math>f(x,y)</math> is "the same" as <math>f(y,x)</math> since x and y are "just variables". but rather, in the sense that <math>f(x,y)=f(y,x)</math>. in symbols:
 
* we are NOT saying <math>(x,y) \mapsto f(x,y) = (y,x) \mapsto f(y,x)</math>. this is trivially true for all functions!
* but rather: <math>(x,y) \mapsto f(x,y) = (x,y) \mapsto f(y,x)</math>. or in other words: <math>\forall x \forall y [f(x,y) = f(y,x)]</math> (this is not trivially true! it's false for many functions including <math>f(x,y) := x-y</math>)
 
in the case of the shapley value, the "marginal contribution" function is NOT symmetric. so the naive fix that we would hope would work is to symmetrize it by adding all the possible permutations of the variables, forming a new function.
 
wait, what? what even ''is'' the "marginal contribution function"?? for a player i of interest, it's the function that gives player i's marginal contribution, given an arbitrary sequence of players as input. let's say we are given a sequence <math>(x_1, x_2, \ldots, x_n)</math>. what's player i's marginal contribution in this sequence? well, if x1 = i, then player i is the first player to join, so the marginal contribution is <math>v(\{i\}) - v(\emptyset) = v(\{i\})</math>. if x2=i, then the marginal contribution of player i is <math>v(\{x_1, i\}) - v(\{x_1\})</math>. and so on. in general, if <math>x_j = i</math> then player i's marginal contribution is <math>v(\{x_1, \ldots, x_{j-1}, i\}) - v(\{x_1, \ldots, x_{j-1}\})</math>.
 
as i said, this function, which we can call <math>f_i</math>, is not symmetric. but we can symmetrize <math>f_i</math> by adding up all the possible orderings of the input variables:
 
<math>\sum_{\sigma \in \mathrm{Sym}(n)} f_i(x_{\sigma(1)}, \ldots, x_{\sigma(n)})</math>
 
given a permutation <math>\sigma \in \mathrm{Sym}(n)</math> and a function <math>f : X^n \to \mathbf R</math>, we can define the permutation of the function <math>\sigma^* : (X^n \to \mathbf R) \to X^n \to \mathbf R</math> by:
 
<math>\sigma^*(f) := (x_1, \ldots, x_n) \mapsto f(x_{\sigma(1)}, \ldots, x_{\sigma(n)})</math>
 
By an abuse of notation, we can drop the star in <math>\sigma^*</math> and just call the resulting extension <math>\sigma</math>. Using this new notation, we can rewrite the symmetrization as
 
<math>\sum_{\sigma \in \mathrm{Sym}(n)} \sigma(f_i)</math>
 
There are now two problems:
 
# the expression above is in terms of variables <math>x_1, \ldots, x_n</math> or in other words it's a ''function'', not a number; we need to evaluate it somehow
# the expression is not ''normalized''. the total value of the grand coalition is <math>v(\{1, \ldots, n\})</math>. we need to make sure that when we add up all the shapley values we give to each player, that that sum is equal to <math>v(\{1, \ldots, n\})</math>.
 
problem 1 is easy to solve. because we symmetrized the function, we can evaluate the function on ''any'' ordering of the players!
 
<math>\left(\sum_{\sigma \in \mathrm{Sym}(n)} \sigma(f_i)\right) (1, \ldots, n)</math>
 
is one such way to get a number. but really, the ordering of (1, ..., n) does not matter.
 
as for problem 2, we want to find a value of <math>\alpha</math> such that the following is satisfied:
 
<math>v(\{1,\ldots,n\}) = \alpha \sum_{i=1}^n \left(\sum_{\sigma \in \mathrm{Sym}(n)} \sigma(f_i)\right) (1, \ldots, n)</math>
 
so just rearrange to solve for <math>\alpha</math>, and we are done. at least, conceptually. it would be nice if we could prove that <math>\alpha = \frac{1}{|\mathrm{Sym}(n)|}</math>. this is not hard to do (just interchange the two sums. for fixed sigma, the form of f_i means you can telescope-sum it to get <math>v(N)-v(\emptyset) = v(N)</math> in each inner sum. see [https://en.wikipedia.org/wiki/Shapley_value#Efficiency wikipedia] for details.)
 
So the fully honest shapley value expression is the following:
 
<math>\frac{v(\{1,\ldots,n\})}{\sum_{i=1}^n \left(\sum_{\sigma \in \mathrm{Sym}(n)} \sigma(f_i)\right) (1, \ldots, n)} \left(\sum_{\sigma \in \mathrm{Sym}(n)} \sigma(f_i)\right) (1, \ldots, n)</math>

Latest revision as of 21:15, 8 April 2023

most expositions of the Shapley value SUCK BALLS because they try to sum over the subsets excluding the playing in question (usually called "player i"). so here we go, here's a TRUE REDPILLED exposition of the shapley value!

first of all, what's the shapley value even trying to do? once we understand it in words, we can just convert our verbal understanding into symbols. and then we will be done.

...


So, the shapley value is an average. but what kind of average? an arithmetic average. well, an arithmetic average takes a specific form. it looks like this. if you're averaging the elements of some set , then the arithmetic average is

We throw in the function f because the elements of X might not be numbers. or even if they are numbers, you might want to apply some weighting other than the default one (the identity function).


Now, let's take the ugly-ass formula for the shapley value that you always see:

how is that supposed to be an average? well first of all, we said above that the shapley value is averaging over all sequences of ways to add the n players. one way to formalize the concept of a "sequence" or "ordering" is to use permutations. a permutation is just a function that reorders the elements of of a set. so each sequence corresponds to a permutation. we can recover a sequence by defining the permutation .

So in what sense is the shapley value an average? if is the set of players, then we can define the set of all permutations on . (This is also denoted as and called the "symmetric group of degree n" since is the "default" set of size n.)

since the shapley value is an average and we are in particular averaging over all sequences, we want to rewrite the formula as something that looks like:

And in fact, at this point we know enough to convert our verbal understanding into a formula like the one above.

a relevant fact is that the size of is .


There is another way to look at this. or rather, a way to extend our understanding. a common thing done in algebra is to symmetrize a function by adding up all the permutations of the variables. a symmetric function is one in which you can interchange any of the variables and the function will stay the same. not in the trivial sense that is "the same" as since x and y are "just variables". but rather, in the sense that . in symbols:

  • we are NOT saying . this is trivially true for all functions!
  • but rather: . or in other words: (this is not trivially true! it's false for many functions including )

in the case of the shapley value, the "marginal contribution" function is NOT symmetric. so the naive fix that we would hope would work is to symmetrize it by adding all the possible permutations of the variables, forming a new function.

wait, what? what even is the "marginal contribution function"?? for a player i of interest, it's the function that gives player i's marginal contribution, given an arbitrary sequence of players as input. let's say we are given a sequence . what's player i's marginal contribution in this sequence? well, if x1 = i, then player i is the first player to join, so the marginal contribution is . if x2=i, then the marginal contribution of player i is . and so on. in general, if then player i's marginal contribution is .

as i said, this function, which we can call , is not symmetric. but we can symmetrize by adding up all the possible orderings of the input variables:

given a permutation and a function , we can define the permutation of the function by:

By an abuse of notation, we can drop the star in and just call the resulting extension . Using this new notation, we can rewrite the symmetrization as

There are now two problems:

  1. the expression above is in terms of variables or in other words it's a function, not a number; we need to evaluate it somehow
  2. the expression is not normalized. the total value of the grand coalition is . we need to make sure that when we add up all the shapley values we give to each player, that that sum is equal to .

problem 1 is easy to solve. because we symmetrized the function, we can evaluate the function on any ordering of the players!

is one such way to get a number. but really, the ordering of (1, ..., n) does not matter.

as for problem 2, we want to find a value of such that the following is satisfied:

so just rearrange to solve for , and we are done. at least, conceptually. it would be nice if we could prove that . this is not hard to do (just interchange the two sums. for fixed sigma, the form of f_i means you can telescope-sum it to get in each inner sum. see wikipedia for details.)

So the fully honest shapley value expression is the following: