﻿ The Nature and History of the Bayes Rule for Computing Inverse Probabilities
San José State University

applet-magic.com
Thayer Watkins
Silicon Valley
USA

The Nature and History of the Bayes Rule
for Computing Inverse Probabilities

In the eighteeenth century mathematicians around Europe were working out the details of probability. This always took the the form of given a condition what are the probability of various events occurring. Thomas Bayes was a Presbyterian minister in England at a time when Christian denomination like the Presbyterians were being pursecuted for not supporting the Church of England. Mathematicians and their mathematics from such sources were being denounced. Thomas Bayes decided to enter the dispute. He published a pamphlet defending Isaac Newton. Thomas Bayes

In the course of his mathematical studies Bayes realized there was an interesting question in probability theory that was not being answered. That question was, "Given the occurrence of an event what were the probabilities of it coming from various possible sources. For example, suppose the flipping of a coin gave ten heads in a row. What is the probability that the flipped coin is a double headed coin versus a reular coin. This came to be called an inverse probability problem.

Bayes did not fully answer this question but the formula which evolved from his ideas is

#### P(E, Ci) = P(Ci, E)/ΣP(Cj , E)

where P(Ci, E) is the probability of event E given condition Ci whereas P(E, Ci) is the probability that condition Ci is responsible for the event E occurring. This is called the Bayesian Rule although it was developed by Pierre Simon Laplace, the brilliant 18th century French mathematician.

For a regular coin the probability of getting ten heads in row is (1/2)10=1/1024. For a double headed coin the probability is 1.00. Thus the probability that the coin is double headed according to the Bayesian Rule is

#### 1/(1+1/1024) = 0.9990244 = 1024/1025

The probability that it is a regular coin is

#### (1/1024)/(1+1/1024) = 1/1025 = 0.00097561

If the coin is flipped an eleventh time and tails comes up the probability that the coin is double headed goes to zero and that it is a regular coin goes to one.

The problem is how should the results of a bayesian computation be interpreted. The answer is that they are in no way probabilities; they are degrees of confidence. The computation of degrees of confidence may be identical to that for probabilities but they are not the same conceptually.

Let D(Ci, E) be the degree of confidence that condition Ci prevails given only the information that event E has occurred. Then the degree of confidence is defined as

#### D(Ci, E) = P(Ci, E)/ΣP(Cj , E)

Then the degrees of confidence for two separate events E1 and E2 may be derived. First in order for those degrees of confidence to be consistent with the above definition they must be given by

#### D(Ci, E1&E2) = P(Ci, E1&E2)/ΣP(Cj , E1&E2)

However if E1 and E2 are independent then

#### P(Ci, E1&E2) = P(Ci, E1)*P(Ci, E2)

For convenience let ΣP(Cj , E1) and ΣP(Cj , E2) be denoted by S1 and S2, respectively.

Then

#### D(Ci, E1&E2) = P(Ci, E1)*P(Ci, E2)/ [ΣP(Cj , E1)*P(Cj , E2)] which is equivalent to D(Ci, E1&E2) = [P(Ci, E1)/S1]*P(Ci, E2)/S2/ [ΣP(Cj, E1)/S1]*P(Cj, E2)/S2] which is the same as D(Ci, E1&E2) = D(Ci, E1)*D(Ci, E2)/ [ΣD(Cj , E1)*D(Cj , E2)]

The event E1 could stand for all of the prior events and E2 for an additional event. Replacing E1 and E2 with Ep and Ea for prior and additional, respectively, the rule for modifying prior degrees of confidence to take into account new information is then

#### Lemma 0: D(Ci, Ep&Ea) = D(Ci, Ep)*D(Ci, Ea)/ [ΣD(Cj , Ep)*D(Cj , Ea)]

This is what is usually called the Bayesian Rule.

## The Asymptotic Irrelevancy of the Prior Degrees of Confidence

Suppose the same event Ea occurs over and over again n times. Let D(Ci, Ep) and D(Ci, Ea) be abreviated as Dpi and Dai, respectively. Then

#### D(Ci, Ep&Ea) = DpiDain/[ΣDpjDajn]

Let DaM be the maximum degree of confidence for the event Ea. It is assumed that this maximu occurs uniquely among the possible conditions. The numerator and denominator of the RHS of the above equation may be divided by DaMn to give

#### D(Ci, Ep&nEa) = Dpi(Dai/DaM)n/[ΣDpj(Daj/DaM)n]

Provided that DpM≠0, it then follows that

#### Theorem 1: limn→∞ D(Ci, Ep&nEa) = 0 if i≠M and limn→∞ D(CM, Ep&nEa) = 1

In other words asymptotically the degree of confidence that the condition of world is M approaches certainty. Notably the prior degrees of confidence are asymptotically irrelevant.

Proof:

For i≠M the numerator of RHS includes the less than unity ratio (Dai/DaM) raised to the power n which goes to zero as n increases without bound. The denominator, on the other hand, contains the term ((DaM/DaM)n=1 which precludes the denominator going to zero as n increases without bound. Instead the limit of the denominator is DpM. For i=M the limit of the numerator is also DpM and hence, providing that DpM is not zero, their ratio is unity regardless of the value of DpM or any of the other prior degrees of confidence.

This result can be extended.

(To be continued.)