User:IssaRice/Aumann's agreement theorem

From Machinelearning
Jump to: navigation, search

Aumann's agreement theorem states that when two agents have the same priors and have common knowledge of each other's posteriors, then the posteriors must be the same.[1] The proof of the theorem is simple; almost all the difficulty of understanding the theorem is in understanding the formalism used and how it applies to real-life situations.

Information partitions

Actually, can't the information states overlap? What if, when Bob rolls a 1, Alice is told "1 or 3" and when Bob rolls a 2 is told "2 or 3" (with the rest of the situation remaining the same)? Then it seems like I((2,1)) = \{(2,1),(2,3)\} and I((2,3)) = \{(2,2), (2,3)\}, which overlap.

Similarly in Geanakoplos's example on p. 261,[2] in his example the agent is conveniently told whether the number is even or odd, giving a partition. But what if the agent is told whether the number is 1-7 or 2-8? It seems like this information states setup restricts the examples we can formalize, and the examples people consider seem to conveniently be formalizable.

In this case, if the real number happens to be 2-7, then the agent is told both "1-7" and "2-8", so he can deduce it is 2-7. Similarly, if he is told "1-7", then he can tell the number is 1, for if it wasn't 1, he would have also been told "2-8". So the information partition ends up being \{ \{1\}, \{2,\ldots,7\}, \{8\} \}. So a rational agent can make their possible information states into a partition by reasoning about which clues are communicated. A general version of this is probably some basic theorem in game theory or a related field, but I am highly ignorant.

Actually, the above operation seems to be to "refine" the "partition" until it becomes an actual partition. The coarsest partition such that each element of the partition is a subset of the original set.

"which means we both know what the intersection is" [1] I don't think Aumann assumes we know which element of the other's partition we are in, so I don't think we know what the intersection is.

Join and meet

Given two partitions of a set, the join is the coarsest common refinement, and the meet is the finest common coarsening.

TODO give more formal definition

TODO give imperative definition [3]

TODO [2] gives a definition by defining I\vee J at each \omega

[4]

"So this “common knowledge” condition is tantamount to mind melding" [2]. But I think:

  • mind melding = join
  • common knowledge = meet

In fact, [3] gives an example where directly exchanging the information leads to a different equilibrium than exchanging posteriors back and forth. So common knowledge and mind melding can't be the same thing.

TODO aumann assumes the join of the partitions consists of nonempty events. it would be good to show where in his proof this assumption is used.

--

warning: there are actually two conventions for what "meet" and "join" can mean. this happens because if you have two partitions P,Q, and P is coarser than Q (i.e. Q is finer than P), do you write this as P \leq Q ("P coarser than Q") or Q \leq P ("Q finer than P")?

from order theory, we already have the convention that:

  • \wedge = inf = meet
  • \vee = sup = join

this tells us things like a \leq b \iff a\wedge b = a \iff a \vee b = b, and a \wedge b \leq a \leq a \vee b.

But what this doesn't tell you is what \leq should mean (or alternatively what \wedge or \vee should mean), because order theory works with many different kinds of partial orders. Specifying any one of \leq, \wedge, or \vee "locks in" the meanings of the other two. so we need to do one of two things:

  • decide that \leq means "is coarser than". This is equivalent to deciding that "meet" means "finest common coarsening", because we need P \wedge Q \leq P,Q. This is the approach that Aumann took in his original paper.
  • decide that \leq means "is finer than". This is equivalent to deciding that "meet" means "coarsest common refinement". This is the approach that many wikipedia pages implicitly take (e.g. they write Hasse diagrams of partitions with the finer partitions on the bottom). This one also makes sense because \wedge = \cap and \vee = \cup, and "join" becomes the operation that takes the union of the partitions (and then merges them to form a valid partition) and "meet" becomes the operation that takes all possible intersections between the blocks in the existing partitions. tyrrell mcallister gives another reason here, which is to work by analogy with the poset of sets (where we've already decided that \leq means \subseteq rather than \supseteq). Just to give one example, this book defines meet as the coarsest common refinement.

see more discussion about this here.

Common knowledge

Statement of theorem

Clarifying what each agent knows

  • the other's information partition?
  • the element of the other's information partition?
  • the intersection of the two elements of the information partitions?
  • the element of the meet of the information partitions?
  • the other's posterior?
  • the set of worlds where the other's posterior is equal to their current posterior?

If I'm Alice with information partition I, then I can know (I\wedge J)(\omega) without knowing what J(\omega) is, right? So we can build common knowledge and exchange posteriors without exchanging our exact information states.

Mechanics of reaching agreement

In the idealized agreement theorem setup, the process for two agents, Alice and Bob, agreeing on posteriors of some event A will look like this:

  1. Alice and Bob each have an information partition.
  2. They each make some observation, updating their posteriors. At this point, their posteriors for A might be the same, but might not.
  3. Alice and Bob tell each other their posteriors.
  4. After hearing the other's posterior, each modifies their own information partition. (The underlying sample space remains the same, but the partitioning changes, so that the posterior can change.)
  5. Repeat (1)–(4) until E = \{\omega \in \Omega : \Pr(A\mid I(\omega)) = q_1 \text{ and } \Pr(A\mid J(\omega)) = q_2\} becomes common knowledge for some constants q_1,q_2 \in [0,1]. In other words, keep exchanging posteriors until they become common knowledge. In step (2), after the first round the observation becomes the other's professed posterior.
  6. At this point, Aumann's agreement theorem applies, and we conclude q_1 = q_2. Alice and Bob agree!

What are the actual mechanics of observing the other's posterior and updating on it? I think what happens is that Bob gives Alice some set of omegas, and Alice gives Bob some other set of Omegas, and each intersects it with their current information set. And this resulting set becomes the new information set. Specifically suppose q_2 = \Pr(A \mid J(\omega)). Then Bob gives Alice the set \{\omega \in \Omega : \Pr(A \mid J(\omega)) = q_2\}. This way, Alice can carry out updates without knowing what J(\omega) is or even what q_2 is.

Maybe knowing J and q_2 is equivalent to knowing the set above, in which case Alice can receive J and q_2 and compute the set on her own.

Another confusion I have: Aumann's theorem only seems to say when the theorem applies at (6). But it seems like what we want to know in addition is that steps (1)–(5) converge, so that we can apply the theorem. Without knowing about convergence, it would seem like the theorem just doesn't apply in many cases. Aumann's paper concludes with "Our result implies that the process of exchanging information on the posteriors for A will continue until these posteriors are equal." If the posteriors ever become common knowledge, then sure, but what if they never become common knowledge?

Given I_0 and J_0, can we write I_1, J_1 in general? It seems like what we want is I_0 \vee J_0, but then I_1 = J_1. Is this okay? It seems like the convergence then happens in a single step? Maybe the examples I'm considering are too simple. Aumann's second coin toss example seems to have more than one exchange, so I should study that in detail. [4] uses the auxiliary sets a_t, b_t to track updates rather than changing the information partitions.

There is a paper called something like "we can't disagree forever", which might be similar to the above. Ok, I just checked the paper,[5] and it looks like this is exactly what the paper is about.

Hal Finney's example

This example is given by Hal Finney in a LessWrong comment.[6]

Let Alice and Bob be two agents. Each rolls a die, and knows what they rolled. In addition to this, each knows whether the other rolled something in the range 1–3 versus 4–6. As an example, suppose Alice rolls a 2 and Bob rolls a 3. Then Alice knows that the outcome is one of (2,1), (2,2), or (2,3), and Bob knows that the outcome is one of (1,3), (2,3), or (3,3).

Given the above description, what are the possible states of knowledge of Alice?

I = \{\{1\}\times\{1,2,3\}, \ldots, \{6\}\times\{1,2,3\}, \{1\}\times \{4,5,6\}, \ldots, \{6\}\times \{4,5,6\}\}

In the possible worlds notation, if we let I be Alice's information partition and J be Bob's information partition, we would write:

  • \omega = (2,3)
  • I(\omega) = \{2\}\times\{1,2,3\} = \{(2,1), (2,2), (2,3)\}
  • J(\omega) = \{1,2,3\}\times\{3\} = \{(1,3), (2,3), (3,3)\}
  • (I\wedge J)(\omega) = \{1,2,3\} \times \{1,2,3\}

\begin{array}{c|cccccc}
& 1 & 2 & 3 & 4 & 5 & 6 \\
\hline
1 & 2 & 3 & 4 &  5 &  6 &  7 \\
2 & 3 & 4 & 5 &  6 &  7 &  8 \\
3 & 4 & 5 & 6 &  7 &  8 &  9 \\
4 & 5 & 6 & 7 &  8 &  9 & 10 \\
5 & 6 & 7 & 8 &  9 & 10 & 11 \\
6 & 7 & 8 & 9 & 10 & 11 & 12
\end{array}

Now let A \subset \Omega be an arbitrary event, and let q_1,q_2 \in [0,1] be constants. We now define the event

E = \{\omega \in \Omega : \Pr(A \mid I(\omega)) = q_1 \text{ and } \Pr(A \mid J(\omega)) = q_2\}

One of the assumptions in the agreement theorem is that E is common knowledge. This seems like a pretty strange requirement, since it seems like the posterior probability of A can never change no matter what else the agents condition on in addition to E. For example, what if we bring in agent 3 and make the posteriors common knowledge again?

What if we take E' = \{\omega \in \Omega : \Pr(A \mid I(\omega)) = q_1\} and say that agent 1 knows E'?

In the form of E above, we can change A to be any subset of \Omega and q_1, q_2 to be any numbers in [0,1]. We can also set the state of the world to be any \omega\in\Omega. The agreement theorem says that as we vary these parameters, if we ever find that (I\wedge J)(\omega) \subset E, then we must have q_1 = q_2.

Let X\colon \Omega \to \mathbf R be a random variable defined by X(x,y) = x+y. In other words, X is the sum of the values of the two dice.

\omega A q_1 q_2 Explanation
(2, 3) 2 \leq X \leq 6 1 1 Given these parameters, E = (I\wedge J)(\omega) so E is common knowledge. This satisfies the requirement of the agreement theorem, and indeed 1=1.
(2, 3) X = 4 1/3 1/3 Given these parameters, E = (I\wedge J)(\omega) so E is common knowledge. This satisfies the requirement of the agreement theorem, and indeed 1/3=1/3.
(2, 3) X = 5 1/3 1/3 Given these parameters, E = \{2,3\}\times\{2,3\}, which is not a superset of (I\wedge J)(\omega), so E is not common knowledge. Nonetheless, 1/3=1/3. (Is this a case of mutual knowledge that is not common knowledge?)
(2, 3) X = 6 0 1/3 Given these parameters, E = \{1,2\} \times \{3\}, which is not a superset of (I\wedge J)(\omega), so E is not common knowledge.

Alice knows she rolled a 2 and Bob rolled something between 1 and 3. Now, consider that Alice is additionally told that Bob did not roll a 1. Now Alice's posterior probability of the event X = 4 is 1/2. How does this affect the agreement theorem? It seems like Alice's information partition changes; in particular, Bob's roll gets divided into \{\{1\}, \{2,3\}, \{4,5,6\}\}. Bob's information partition remains the same, so now (I\wedge J)((2,3)) = \{1,2,3\}\times\{2,3\}. Now for the event X = 4, \Pr(X=4 \mid I((2,3))) = 1/2 and \Pr(X=4 \mid J((2,3))) = 1/3 and the set E = \{(1,2), (1,3), (2,2), (2,3)\} is not a superset of (I\wedge J)((2,3)) so the agreement theorem doesn't apply. And indeed, that's good because 1/2 \ne 1/3.

Another thing: can't Bob deduce that Alice must have rolled a 1 or 2 by knowing her posterior that X=4 is 1/2? If that's the case, it seems like the meet reduces to {(1,2), (1,3), (2,2), (2,3)}, which is the same as E, so the agreement theorem applies again, with the posterior of X=4 being 1/2. Actually, this might be the whole point of Aumanning, because otherwise the theorem just doesn't apply in most cases.

Aumann's first coin flip example

This example is from Aumann's original paper.[1]

let's hope i get this right. let A be the event "heads on the second toss", H be the event "heads on first toss" and T be the event "tails on first toss". B is the random variable representing the bias of the coin, and b is a specific value. B is distributed uniformly in [0,1].

\begin{align}\Pr(A\mid H) &= \int_0^1 \Pr(A \mid H, B=b)\cdot \Pr(B=b\mid H) \, db \\
&= \int_0^1 b \cdot \frac{\Pr(H\mid B=b) \cdot \Pr(B=b)}{\Pr(H)} \, db \\
&= \int_0^1 b \cdot 2b\,db \\
&= 2/3
\end{align}

and

\begin{align}
\Pr(A\mid T) &= \int_{b\in [0,1]} \Pr(A\mid T, B=b) \cdot \Pr(B=b\mid T) \\
&= \int_{b\in [0,1]} b \cdot \frac{\Pr(T \mid B=b) \cdot \Pr(B=b)}{\Pr(T)} \\
&= \int_0^1 b \cdot 2(1-b)\,db \\
&= 1/3
\end{align}

Aumann's second coin flip example

This example is from Aumann's original paper.[1]

[7]

References

things to read if i ever come back to this: