User:IssaRice/Belief propagation and cognitive biases: Difference between revisions

From Machinelearning
No edit summary
No edit summary
 
Line 1: Line 1:
* "Several cognitive biases can be seen as confusion between probabilities and likelihoods, most centrally base-rate neglect." [https://www.greaterwrong.com/posts/tp4rEtQqRshPavZsr/learn-bayes-nets]
* "Several cognitive biases can be seen as confusion between probabilities and likelihoods, most centrally base-rate neglect." [https://www.greaterwrong.com/posts/tp4rEtQqRshPavZsr/learn-bayes-nets] [https://lw2.issarice.com/posts/6vMBpZtoRw4ia2JrK/probability-vs-likelihood]
** confusing p-values with Pr(null hypothesis {{!}} data) seems like another instance of this.
** confusing p-values with Pr(null hypothesis {{!}} data) seems like another instance of this.
** confidence interval vs credible interval also does this flipping about the conditional bar.
** confidence interval vs credible interval also does this flipping about the conditional bar.

Latest revision as of 06:06, 11 November 2020

  • "Several cognitive biases can be seen as confusion between probabilities and likelihoods, most centrally base-rate neglect." [1] [2]
    • confusing p-values with Pr(null hypothesis | data) seems like another instance of this.
    • confidence interval vs credible interval also does this flipping about the conditional bar.
    • it seems like for some kinds of evidence (like car alarms going off) we automatically take the base rate into account (e.g. i find myself ignoring car alarms because they've never turned out to be real thefts).
  • "It’s also a useful analogy for aspects of group epistemics, like avoiding double counting as messages pass through the social network." [3]
    • it seems like one reason group epistemics is hard is that we can't control the structure of the social graph, i.e. it could be a very dense graph or have unexpected edges in it, which makes accounting for evidence (and avoiding double counting of evidence) difficult. But if we could fully control the social graph and instruct people to message pass the lambdas and pis according to the belief prop algorithm, then it seems like we could do gossip the right way.
  • https://www.readthesequences.com/Fake-Causality discusses belief propagation and its relation to fake explanations, e.g. "Alas, human beings do not use a rigorous algorithm for updating belief networks. We learn about parent nodes from observing children, and predict child nodes from beliefs about parents. But we don’t keep rigorously separate books for the backward-message and forward-message. We just remember that phlogiston is hot, which causes fire to be hot. So it seems like phlogiston theory predicts the hotness of fire. Or, worse, it just feels like phlogiston makes the fire hot." and "Speaking of “hindsight bias” is just the nontechnical way of saying that humans do not rigorously separate forward and backward messages, allowing forward messages to be contaminated by backward ones."
  • Also look at the general structure of rationalizations, and see if belief propagation can help us understand them.
  • I think a polytree graph like can illuminate the halo effect/horn effect
  • Maybe https://en.wikipedia.org/wiki/Berkson%27s_paradox The page even says "The effect is related to the explaining away phenomenon in Bayesian networks."
    • From Pearl's Causality (p. 17): "At first glance, readers might find it a bit odd that conditioning on a node not lying on a blocked path may unblock the path. However, this corresponds to a general pattern of causal relationships: observations on a common consequence of two independent causes tend to render those causes dependent, because information about one of the causes tends to make the other more or less likely, given that the consequence has occurred. This pattern is known as selection bias or Berkson’s paradox in the statistical literature (Berkson 1946) and as the explaining away effect in artificial intelligence (Kim and Pearl 1983). For example, if the admission criteria to a certain graduate school call for either high grades as an undergraduate or special musical talents, then these two attributes will be found to be correlated (negatively) in the student population of that school, even if these attributes are uncorrelated in the population at large. Indeed, students with low grades are likely to be exceptionally gifted in music, which explains their admission to the graduate school."
  • Can the ideas in "Beware surprising and suspicious convergence" be explained in terms of belief propagation/explaining away/causal diagrams/something else in this general area?
  • https://en.wikipedia.org/wiki/Blocking_effect
  • Fundamental attribution error? The simplified DAG would look like: situational influence → observed action ← personality. And the evidence feeds into the "observed action" node, which propagates upwards to the "situational influence" and "personality" nodes. I think the bias is that the "personality" node gets updated too much. Can belief propagation give insight into this? Actually I think the problem is that when we observe others, the causal graph we mentally draw looks like "observed action ← personality", but if we're the ones in the situation it looks like "situational influence → observed action". So it's more of a problem of using an incomplete causal graph rather than a problem with propagating updates.
  • This one might be too simple, but the idea of screening off I think can be visualized in a Bayesian network. Not sure where the belief propagation would come in though... Related here are [4]/stereotyping.
  • Hindsight bias seems like an evidence node misfiring and causing updates in the graph? See also https://www.lesswrong.com/posts/TiDGXt3WrQwtCdDj3/do-we-believe-everything-we-re-told
  • Hindsight bias is also weird because it involves events happening over time.
  • Buckets error and flinching away from truth: I think you can formulate a probabilistic version of my comment using bayes nets and belief prop. (in that case, there still may or may not be causality involved; i think all you need are the independence relationships.)
    • [5] the sour grapes/tolerification seems pretty similar, but the steps go like this: (1) initially, one has stored (example: X=grapes unreachable, Y=grapes sour). (2) the world shows you in a way that's undeniable (this is contrasted with the buckets error situation, where someone merely asserts/brings to attention ). (3) one does the modus ponens, obtaining . Here, is undesirable (the world would be better with sweeter grapes!), but even more undesirable is (i.e. , where the grapes are both sweet and unreachable), and by (2), we cannot deny . So we pick the best of the undesirable choices and stick with .
And why is so undesirable? Because there is another implication, , stored in your brain! And Z says "the world is intolerable". So to deny Z you must deny . This is still different from a buckets error, because the implication is true.
I think a network for this situation looks like and . So it's still a DAG but there is now a loop. Or maybe is sufficient, i.e. the update on sourness only happens via the tolerability node.
There is something funny going on at the Z node, I think. Like it is failing to update, and sending the opposite message to Y or something. I'll need to work out the calculation to be sure.

possibly related