Discrete Probability
Last updated
Last updated
In the analysis of randomized algorithms, the concepts of discrete probability help devise the mathematical models catering the needs to perform asymptotic analysis.
Note: despite that continuous probability is also pivotal in various scientific worlds, discrete probability is mostly used in the computing subjects and hence the major topic in this chapter.
The following definitions are essential to know for applications in randomized algorithm analysis.
To analyze the various randomized algorithms, the definition of Indicator Random Variable is adopted:
Figure 1. Indicator Random Variable
Note: where Α denotes the event within the sample space
Given the sample space and its subsets event Α, Β, etc. the Conditional Probability denotes the concept that "event Β's probability to occur given the event Α happens as well":
Р(Β | Α) = Р(Α ∩ Β) / Р(Α)
Figure 2. Venn Diagram of Intersection of Event Α and Β
if event Α and Β are independent (which means the events have no intersections graphically represented above.), Р(Α ∩ Β) = Р(Α) ⋅ Р(Β), Р(Β | Α) = Р(Β)
Formal Def: let 𝒳1, 𝒳2, ..., 𝒳n be random variables defined on sample space. Then:
To take advantage of the definitions above, there are several related probabilistic problems to cover.
How many people is needed at least in order that two of them having a 50-50 chance to born on the same day? The answer is an astonishing 23, which is far smaller than the total number of days in a year.
It is a paradox given that only 70 people will have 99.9% possibility to discover two of them sharing the same birthday, much more efficient than finding such a pair in more than 366 people.
Proof:
Assume that leap year is not considered and all 365 days are equally likely to be chosen as one's birthday. (the differences of birth year is neglected)
To compute the probability of two people in a group sharing the same birthday, which can be seen as event A, a complementary event A' that no two individuals having the same birthday is introduced.
If a guy a chooses day d(a), then a guy b must chooses day d(b) ≠ d(a) in order that event A' still holds. Then, a guy c must not choose either of d(a) or d(b) as well.
By conditional probability, P(A') = 1 ⋅ (1 - 1/365) ⋅ (1 - 2/365) ... (1 - (n-1)/365), where n is the total number of people.
According to Taylor Series of the exponential function,
еx = 1 + x + x2/2! + .. ≈ 1 + x
Then, replace x with numbers in the P(A') equation,
P(A') = е-1/365 ⋅ е-2/365 .. ⋅ е-(n-1)/365 = е-(n(n-1)/2)/365
P(A) = 1 - P(A') ≈ 1 - е-n2/2*365</sup>, and its curve shown below:
Obviously, when the total number of people is 23, there is 50% chance to have at least two people ended up with same birthday; when it reaches 70, the probability is almost 100%. Then, proof ends.
Figure 3. Linearity of Expectation
Figure 4. The computed probability of at least two sharing the same birthday against the total number of people