Discrete Probability

In the analysis of randomized algorithms, the concepts of discrete probability help devise the mathematical models catering the needs to perform asymptotic analysis.

Note: despite that continuous probability is also pivotal in various scientific worlds, discrete probability is mostly used in the computing subjects and hence the major topic in this chapter.

The following definitions are essential to know for applications in randomized algorithm analysis.

Indicator Random Variable

To analyze the various randomized algorithms, the definition of Indicator Random Variable is adopted:

Note: where Α denotes the event within the sample space

Conditional Probability

Given the sample space and its subsets event Α, Β, etc. the Conditional Probability denotes the concept that "event Β's probability to occur given the event Α happens as well":

Р(Β | Α) = Р(Α ∩ Β) / Р(Α)

if event Α and Β are independent (which means the events have no intersections graphically represented above.), Р(Α ∩ Β) = Р(Α) ⋅ Р(Β), Р(Β | Α) = Р(Β)

Linearity of Expectation

Formal Def: let 𝒳1, 𝒳2, ..., 𝒳n be random variables defined on sample space. Then:

Example Problems

To take advantage of the definitions above, there are several related probabilistic problems to cover.

Birthday Paradox

How many people is needed at least in order that two of them having a 50-50 chance to born on the same day? The answer is an astonishing 23, which is far smaller than the total number of days in a year.

It is a paradox given that only 70 people will have 99.9% possibility to discover two of them sharing the same birthday, much more efficient than finding such a pair in more than 366 people.

Proof:

Assume that leap year is not considered and all 365 days are equally likely to be chosen as one's birthday. (the differences of birth year is neglected)

To compute the probability of two people in a group sharing the same birthday, which can be seen as event A, a complementary event A' that no two individuals having the same birthday is introduced.

If a guy a chooses day d(a), then a guy b must chooses day d(b)d(a) in order that event A' still holds. Then, a guy c must not choose either of d(a) or d(b) as well.

By conditional probability, P(A') = 1 ⋅ (1 - 1/365) ⋅ (1 - 2/365) ... (1 - (n-1)/365), where n is the total number of people.

According to Taylor Series of the exponential function,

еx = 1 + x + x2/2! + .. ≈ 1 + x

Then, replace x with numbers in the P(A') equation,

P(A') = е-1/365 ⋅ е-2/365 .. ⋅ е-(n-1)/365 = е-(n(n-1)/2)/365

P(A) = 1 - P(A') ≈ 1 - е-n2/2*365</sup>, and its curve shown below:

Obviously, when the total number of people is 23, there is 50% chance to have at least two people ended up with same birthday; when it reaches 70, the probability is almost 100%. Then, proof ends.

Last updated