Let’s talk some about Bayes’ Theorem, and why you ought to know it. It isn’t just good to know for programmers: it helps you straighten out probabilities. This is especially useful for, say, understanding statistics in the news, where statistics are often abused to sensationalize stories.
To put it very simply, Bayes’ Theorem deals with the detection of events, and helps you estimate the likelihood of future events based on past events.
`P(A|B) = (P(B|A) * P(A))/(P(B))`
A very quick rundown of the notation: `P(X)` is the probability (a number between 0 and 1) that we will detect an event `X`, and `P(X|Y)` is the probability that we will detect an event `X` given that we have already detected another event, `Y`.
That’s the small, beautiful form. For real data, usually you have no measurement of `P(B)` independent of `P(A)`. Typically, all you have is the posterior: `P(Prediction|Data)`, but it’s not often the case that you have all of the possible data. Luckily, there’s a handy relation that will get you `P(B)`:
`P(B) = P(B|A) * P(A) + P(B|¬A) * P(¬A)`
Where here the notation `¬` is the logical negation, so `¬A` means `A` was not detected; numerically, `P(¬A)` has the same value as `1 – P(A)`. This of course yields the longer form:
`P(A|B) = (P(B|A) * P(A))/(P(B|A) * P(A) + P(B|¬A) * P(¬A))`
A Simple Example.
One of the very cool things about Bayes is that it articulates something that “everybody knows”: essentially, this is how your brain works. (More on this in a later post.)
Let’s say you live in Pasadena, where the Rekka Labs are, and you have a dog. The dog barks. According to the City of Pasadena, there were 36 residential burglaries in June of 2017. According to Wikipedia, the Census reports there are 55,270 households in Pasadena, meaning that you have a `36/55270` probability of your house getting burgled that month, which yields a daily probability of about 0.00002172. (We’re going to gloss over Bernoulli distributions and a couple complicating factors for the sake of simplicity.) Let’s suppose the dog barks in the middle of the night once a week (daily probability of approximately 0.2427), and if an unfamiliar person comes near your house, the dog barks 9 times out of 10. What are the odds that you should get out of bed to check?
She doesn't usually bark. pic.twitter.com/nVx8DwQM1h
— Rekka Labs (@RekkaLabs) August 23, 2017
What we’re looking at is `P(A|B)`, where `A` represents the probability that you are being burgled and `B` the probability that the dog barks. The probability that the dog barks (`P(B)`) is 0.2427, and the probability that the dog barks given that someone is near your house (`P(B|A)`) is 0.9. This makes the math pretty easy to work out:
`P(Burglary|Bark) = (P(Bark|Burglary) * P(Burglary)) / (P(Bark))`
This gives us the following:
`(0.9 * 0.00002172) / 0.2427 ≈ 0.00008054`
So you have about a 0.008% chance of being burgled when your dog barks. That solves the curious incident of the dog in the night-time. (Again, sorry Bernoulli, you’re beyond the scope of the current blog post.)
You’d never get out of bed to check the window if that’s the only calculation you were doing, but you also apply a cost-benefit analysis. The cost of getting out of bed is low, but the cost of being burgled is high, so unless you’re very tired or it’s a very cold night (which increases the cost of getting out of bed), you probably check sometimes. If you play with the numbers a bit, you can see that we have formal mathematical proof of why a yappy dog gets ignored (much higher `P(B)`) or why a dog that doesn’t bark as often is not as useful for this purpose (lower `P(B|A)`).
(The canonical explanations usually involve cancer screenings. Concrete, quotidian events that don’t involve chronic illness seemed like a better example, and easier to grasp without the cognitive overhead of a memento mori.)
Bayes was born in London around 1701 and died in Kent in 1761, and during that time never actually published the theorem for which he is famous. He was a Presbyterian minister, a philosopher, and a mathematician. He wasn’t exactly a tragic genius, though: he was inducted into the Royal Society in 1742. The actual situation is stranger: he wasn’t considered very noteworthy. Obviously you need some notoriety to become a Fellow of the Royal Society, but there aren’t any biographies, he wasn’t knighted, and we don’t even have any known contemporary portraits. We’re not even sure why he was inducted into the Royal Society! Working backwards, we kind of assume it was for a defense of Newton’s Calculus, but that work would probably not be very well-known today had it not been written by Bayes.
So it’s kind of lucky that we ended up with Bayes’ Theorem to begin with: one of his friends (Richard Price, who was a somewhat important figure in the history of the American Revolution) went through his unpublished essays after his death, and ended up publishing An Essay towards solving a Problem in the Doctrine of Chances in 1763, two years after Bayes died of…some kind of illness. Nobody bothered to write that down, either, apparently.
So, ironically, the odds that we’d have Bayes’ Theorem were not great. What if the papers had been discarded after his death? What if Price hadn’t recognized the importance of the work? (After all, Bayes himself hadn’t published it, and we don’t know if he was going to.) We’d lose an entire field of statistical analysis!
Worse, no one would be able to understand this XKCD.
Next Time: A Classifier, Real Data
We’ll follow up soon by constructing a Bayesian Classifier that is simple to use and understand, and test it out on some real data, and talk about some real-world applications, including A/B-testing, spam filters, and FeelsBot.
Incidentally, there appears to be a popular belief that “Bayesian vs. Frequentist” represents an ideological split in statistics; this isn’t actually the case.
This post was a little math-heavy, so we used asciimathml, which was fun and convenient!
We have updated it after getting some excellent feedback from Rob.